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THE INFLUENCE OF MONTH OF BIRTH ON 
INTELLIGENCE QUOTIENTS 


RUDOLF PINTNER AND GEORGE FORLANO 


Teachers College, Columbia University 


That climate has had and still has a profound influence upon human 
nature is the opinion of many. Some believe such influence to be so 
great as to affect the original tendencies and capacities of the indi- 
vidual. Speculations like the following are not uncommon: ‘It should 
readily suggest itself that of the numerous peculiarities of mind and 
body for which no explanation can be found in the family strain, no 
few might trace back to climatic influences upon the germ. From 
this quarter also may, in some measure, flow the variations in tem- 
perament among children of the same parents born and reared under 
similar conditions, nor is it impossible that the tendencies in mature 
men and women which lapse so easily into insanity, and lesser psycho- 
logical and even physical derangements, spring in some part from like 
causes.” And in the same year in which this appeared, we find 
Tramer? reporting the smallest number of patients born in May and 
the largest number in December among thirty-one hundred cases in a 
Swiss hospital for mental diseases. 

There are numerous such suggestions as to the influence of month 
of birth on mentality, but the evidence so far is not impressive and it 
is conflicting. The number of cases considered is generally very small. 
Thus Allen* reports on only two hundred eminent people. The 


1 Kassel, C.: Birth Months of Genius. Open Court, Vol. XLIII, 1929, pp. 
677-695. 

2 Tramer, M.: Uber die biologische Bedeutung des Geburtsmonates, insbeson- 
dere fiir die Psychose-erkrankung. Schweiz. Arch. f. Neur. u. Psychiat., vol. 
XXIV, 1929, pp. 17-24. Hg 

* Allen, F. J.: Seasonal Incidence of the Births of Eminent People. Nature if 
Vol. CX, No. 2749, 1922, p. 40. LH 
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greater number were born in the cold months, February being dis- 





tinctly the most prolific, with December next in importance. In ar 
the same journal a note by Fairgrieve' comments on the results of nt 
three hundred sixty-eight boys tested with the American Army Intelli- 
gence Tests. It was found that the boys born in late spring were less fou 
i intelligent than those born about October. There are several other fur 
ie references of this type, speculative rather than factual, dealing with aor 
relatively few cases and these not treated with any adequate statistical oor 
techniques. IQ 
There is another group of articles of a medical nature suggesting we 
4 the influence of sunlight, temperature or a seasonal variation of the he 
aM inner secretions or other physiological conditions.2 The general per 
i ae trend of such articles is to suggest the advantage to the growing organ- ond 
et ism to be born in the summer months, but again the evidence seems wi 
te, very meager and somewhat questionable. life 
Hedi: Using the intelligence quotients of four hundred fifty-three back- he 
1 ward children Blonsky* found no difference between the means for of 
4 summer, autumn and winter, but a higher mean IQ for spring. He wh 
et also found a difference between the mean IQ for March to May as dis 
et contrasted with the mean IQ for October to December inclusive, and he 
this difference he regards as significant. uty 
Pintner‘ followed up this study of Blonsky by collecting forty-nine = 
ed hundred twenty-five IQs, about four hundred for each month. He 
c found differences between the various months and seasons. The 
ae lowest monthly IQ was for November, the highest for October. The 
ee lowest seasonal IQ was for Winter, the other seasons being much 
eh, the same. All of these differences were small and none statistically sig- 
a, nificant. The largest difference found was that between the warm T 
aa months (May to October) and the cold months (November to April). sc 
ad This amounted to 1.4 IQ points and was 2.66 times the sigma of the Ye 
oh difference. The direction of the difference favors the warm months gr 
os and is in line with Blonsky’s original findings. This tendency for the - 
4 1 Fairgrieve, McC.: Birthdays in Relation to Intelligence. Nature, Vol. CIX, a 
No. 2729, 1922, p. 218. | oii 
? E.g., Hess, A. F. and M. A. Lundagen: A Seasonal Tide of Blood Phosphate in ter 
a. Infants. J. Am. Med. Assoc., Vol. LX XIX, 1922, pp. 2210-2212. wa 
M * Blonsky, P. P.: Frih- und Spiatjahrkinder. Jahrbuch f. Kinderheilkunde, all 
ae Vol. CX XIV, 1929, pp. 115-120. pe 
a ‘Pintner, R.: Intelligence and Month of Birth. J. Appl. Psychol., Vol. XV, to 


Pied 1931, pp. 149-154. 
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mean IQ to be lower during the colder months is so persistent that the 
matter has been investigated further in the present article with a very 
much larger number of cases. 

The Data.—Our data consist of 17,502 IQs. We thus have about 
fourteen hundred for each of the twelve months. The data have 
furthermore been divided in two different ways, namely according to 
social status and according to location. Neither of these divisions 
could be carried out with extreme accuracy. The schools from which 
IQs were obtained were classified into high, medium or low social 
levels. The urban group was divided into three social levels on the 
basis of a socio-economic score card filled out by the children them- 
selves. The other cases were so divided on the bases of the best judg- 
ments of the writers. The reason for such a division was to enquire 
whether the suggested influence of sunlight during the early months of 
life would show itself more strongly at a low social level. Children 
born in poor social surroundings might get a relatively greater amount 
of sunlight and fresh air during the summer as contrasted with the 
winter, when they might be more confined and suffer greater hygienic 
discomfort. The difference between winter and summer might not 
be so obvious at a high social level, where opportunities for getting as 
much sunlight and fresh air as possible would always be present. The 
numbers for each social level are: 


SIRE, IE RE EE ee Se mere Pees 7 5586 
Ne el SE n a an ha ng wa haba 46 pee beeee east cae 4171 
NG eos 0k o0s 6 OETA he Caled ad bebe 00} hem eras eee ee baa 7745 


The data were farther divided into urban and miscellaneous. 
The urban cases were all children attending New York City public 
schools, tested by the Bureau of Reference and Research of the New 
York City Public School System. It was thought wise to keep this 
group separate, because it had been given the same tests in a system- 





1 We are particularly indebted to two sources for help in the collection of IQs. 
Miss May Lazar and Dr. Eugene Nifenecker of the Bureau of Reference and 
Research of the New York City Schools allowed us access to their valuable intelli- 
gence survey of the New York city schools. The five thousand eight hundred 
twenty-two cases thus obtained constitute our urban group. The other source 
was the Educational Records Bureau, New York. Mrs. Eleanor Perry Wood 
allowed us to copy about one thousand six hundred IQs and birth months. These 
cases were all from independent private schools and were a welcomed addition 
to our data for the high social level. The rest of the cases, about ten thousand, 
were from our own records. 
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atic city-wide survey. The number of cases in this group is fifty-eight 
hundred twenty-two divided into three social levels: 


NN i i i ai a i ae og ire ak le lee ca 1931 
RERANCH Ny 1849 
BLOC eh od the chntceh Cant beehi ech onaeat Cade eosaee Cree 2042 


The miscellaneous group consisted of all other cases not included in 
this survey. It contained children in public and private schools in 
New York City and surrounding regions. There were very few rural 
cases. Geographically all the cases were from the eastern part of the 
country. It is not therefore, geographically distinct from the urban 
group. It differs, however, in that the cases are not restricted to New 
York City, and more particularly in that the children were examined 
by many different tests, by different examiners at different times. 
This miscellaneous group consists of 11,680 cases, distributed into 
levels as follows: 


Nee td ie eae 3655 
AT bees eel eects Sip phen da Rn ry YF Dee OR Lh RYE Fp AE 2322 
GS ch hs Se oe hi vc dee h es bebe biivec be beardsdvbheeeces 5703 


The percentage distribution of IQs for the three different social 
levels and all levels combined is shown in Table I. The total number 
of cases at each level shows a much larger number at the low and high 
social levels than would actually be found in a true cross-section of the 
population, and this is borne out by a study of the distribution of the 
1Qs for the total group. The distribution flattens out in the center 
from about ninety to one hundred twenty IQ, instead of showing a 
decided peak about one hundred. It is, of course, not necessary for 
this study to have the three social levels represented in the same pro- 
portions as they exist in the general population. The means for the 
three social levels are, however, very much what one would expect, 
considering the positive correlation usually found between IQ and 
social status. This shows that our very rough classification into three 
social levels was in general satisfactory. 

Finally, we have made a comparison of the distribution of our 
cases by birth-months with that of the birth-months for the registra- 
tion area of the U. §., as reported by the Bureau of the Census. We 
give below our cases and the mean number of births per month to the 
nearest thousand for the United States for fourteen years, from 1915 
to 1928. The rank correlation between the months arranged in order 
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TaBLE I.—DistTriputTion oF IQs Accorpine To Sociat LEVEL AND FOR ALL 
LEvELS CoMBINED 











1Q High status | Medium status| Low status All levels 
percentage percentage percentage percentage 
195-200 He ane habitkes OO eee .O1 
190-194 
185-189 
180-184 ee MeO eee PT ota .O1 
175-179 ee peeing ee: hades .O1 
170-174 Se eo niet OE... ee .01 
165-169 J A a eek, TEE eet eie :02 
160-164 Ee ate wes Ol .05 
155-159 . 36 02 04 .14 
150-154 .59 .29 21 .35 
145-149 .93 .72 10 .52 
140-144 2.33 1.27 .37 1.21 
135-139 3.99 2.09 .85 2.15 
130-134 7.48 3.52 1.30 3.80 
125-129 10.19 3.81 2.07 5.07 
120-124 14.29 5.01 3.01 7.08 
115-119 14.36 5.99 4.30 7.91 
110-114 12.76 8.32 5.36 8.43 
105-109 10.17 8.41 7.24 8.46 
100-104 7.50 10.27 9.32 8.96 
95— 99 5.37 11.63 9.72 8.79 
90— 94 3.72 11.65 10.99 8.83 
85-— 89 2.09 9.61 10.38 7.55 
80— 84 1.38 6.14 9.28 6.01 
75— 79 .93 4.31 8.16 4.94 
70— 74 .63 3.02 6.92 3.98 
65— 69 34 1.75 4.78 2.64 
60— 64 .25 1.27 3.19 1.79 
55-— 59 .02 .60 1.43 .78 
ae 2.” scutes . 26 .66 .35 
45— 49 .02 .05 .30 15 
Whee kone 5,586 4,171 7,745 17 ,502 
Mean IQ...... 115.15 100.95 92.50 101.75 

















from greatest number of births to least for the two distributions is .88. 
We are satisfied, therefore, that the distribution of our cases per month 
is close enough to the general distribution of the United States, so that 
our findings will not be influenced by any chance peculiarity that 
might have entered into our data. 

The Tests.—The intelligence tests used with this group of 17,502 
cases were very diverse. Some of the more commonly used ones were: 
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r Mean U.S.A. 
“number | Pefeentage |p thousands| PeTeentage 
PN ios ona SSK 1,559 8.91 137 8.52 
WI 6 iss Ci ink 1,452 8.30 130 8.09 
PG eke a soln se wi 1,556 8.89 142 8.83 
AGREES rasp etre ene 1,492 8.52 133.5 8.30 
MEN Nuisiles ais’ wig de ¥< 1,412 8.07 136 8.46 
Ne a i 1,396 7.98 131 8.15 
NEA ie vinldinia pia cad bin 1,532 8.75 138 8.58 
NT iia s setae dao Ge 1,518 8.67 140 8.71 
September............. 1,495 - 8.54 136 8.46 
SI vices cau o's Sab 1,406 8.03 133 8.27 
November............. 1,336 7.63 124 7.71 
BPOOMINGR «.. 5 bis. oscia dius 1,356 7.75 127 7.90 
WR oko ss dncwa ddns 17 ,502 ive 1,607.5 








Stanford Revision of the Binet-Simon Scale, Pintner-Rapid Survey, 
National Intelligence, Haggerty Delta 1, Pintner-Cunningham Pri- 
mary, Terman Group, Pintner Intelligence, Otis Primary, Otis Self- 
Administering, Pintner Non-Language Primary, Dearborn A and C, 
Detroit Primary, Miller Mental Ability. 

The children tested were in all grades up to the end of high school, 
but the greater number were in the first eight grades. No college 
students were included. 


TasLe II.—Megan IQ sy Monts or Birta 
Low Status—Miscellaneous Cases 























Month Mean IQ SD SD mean| Range N 
SEF EP OT Oe ee 91.80 18.45 .81 45-140 514 
| Ree 90.30 18.05 .82 45-150 478 
IL hic. 6 5k di 0s Skee woke on 92.15 18.40 .82 50-155 508 
BGS, sie'-a's's cases Rha 92.35 17.45 .82 50-160 453 
EE dsb sd ve bow ee wee 91.35 18.05 . 83 45-155 467 
| SS eee ee 90.90 16.35 .78 45-150 443 
| EET Geen Ty eee 92.15 18.05 .83 45-150 477 
Nb 5 wk Sind ain ped 92.10 19.30 85 45-150 513 
ee 92.50 17.10 .77 50-150 496 
PEG RaiGdcasavectteon 91.98 17.10 .80 45-145 451 
I ails v tos coxa 92.30 17.80 .86 45-145 430 
NE as oink ke Giles 91.50 16.95 .78 45-150 473 

 EEREE DER FESR 91.80 17.90 .24 45-160 | 5703 
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Comparison of Social Levels.—Dividing our data into three levels 
with two geographical divisions at each level, we have six tables like 
Table II. Since we cannot publish all the tables, this will be a sample 
of the tables showing monthly means by level and division. This 
table shows the means, standard deviations, standard errors of the 
means, total ranges to the nearest five-point interval, and number of 














| Group Highest mean Lowest mean ee 

ence 
Low, miscellaneous.... . September, 92.50} February, 90.30) 2.20 
Low, urban............ June, 97.25) April or December, 92.70) 4.55 
Medium, miscellaneous.| September, 97.45) July, 94.35) 3.10 
Medium, urban........ July, 110.15) April, 103.10) 7.05 
High, miscellaneous... .| September, 117.45) February, 114.95) 2.50 
High, urban........... November, 115.05) January, 110.25) 4.80 














cases. These are the main facts for the miscellaneous division of the 

low status group. If we study the monthly means we cannot see any 
pronounced tendency for the IQ to increase in summer and decrease 
in winter. If there is any such tendency, it must be very slight and 
very irregular. A study of the five other tables leads to the same con- 
clusion. If now, we pick out from each of these six tables the lowest 
and highest monthly mean IQs, we have the results shown above. 


Tas.Le II].—Megan IQ sy Monts or Birt 
Low Status—aAll Cases 











Month Mean IQ SD SD mean| Range N 
RRB a 91.87 18.25 .68 45-145 709 
CM ce la ict sa cutice 91.34 18.25 .72 45-150 640 
CE ot) 52 cee ms ato aca 92.58 18.95 .72 50-155 696 
MTT. beth os tc wedde tenn 92.44 18.30 .74 50-160 618 
SES 2Gsa ei Vows ceuee ees 92.70 18.95 .76 45-155 626 
EE ee pres 92.49 17.65 .73 45-150 591 
ae Clk daha 93.36 18.60 .70 45-150 699 
Sintec ess ¢sanuene 92.88 18.80 71 45-150 693 
September............... 93 .00 18.30 71 50-150 665 
RC cs Gao ale vienna ¢ O02 92.64 17.90 .72 45-145 615 
I co cs bbe o.0e'e 92.61 18.40 .76 45-145 578 
is 5 ids oeueewe 91.77 17.70 71 45-150 615 
(PRE OI er 92.50 21 
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The standard deviations of these differences have not been calculated, 
as we shall give such standard deviations for larger groups presently. 


Taste [V.—Megan IQ sy Monts or Birt 




























































Medium Status—All Cases 
Month Mean IQ SD SD mean| Range N we 
uni 
SOLE Ee 100.10 | 17.60 91 50-150 377 yee 
NE ss coscadaaencese’ 99.49 | 18.15 .94 50-145 372 the 
ose Poe 101.55 18.00 .92 55-150 382 
BS a tie wah bok icons 99.90 18.45 .98 50-150 352 col 
a SAAR 6 Re 102.05 | 18.50 1.00 | 55-150| 339 IQ 
Ne tai cc Gaeaa 102.56 | 19.70 1.07 45-150 341 Ta 
OSS IS 101.10 | 17.95 98 | 50-155| 337 Fe 
EG tibia bikin case eae xa 100.79 18.15 .95 50-145 362 do 
September............... 101.35 18.80 1.07 55-150 308 
MSS. os ects 101.61 | 17.75 93 | 50-150} 362 ta 
November................ 100.89 | 18.35 1.04 | 55-145] 309 Mi 
pi ee eee roy 100.19 17.85 .98 50-150 330 for 
ste 
ON 5 Ss ie ¢ ih ale sit 100.95 18.50 .29 45-155 4171 th 
TaBLE V.—Megan IQ sy Monts or Birt mi 
High Status—All Cases re 
Month Mean IQ SD SD mean| Range N 
ITER LS 113.47 | 15.30 .70 45-155 | 473 * 
NN kgs OSs Sao bee 114.53 | 15.30 .73 60-160 440 le 
os 04% Musica sadn ee 114.16 15.65 71 60-160 478 M 
SE iviuchoc's vas cwwweoin 115.56 | 15.25 .67 60-175 522 Hi 
RNR. ocak able nines s mea cn 115.10 15.95 15 55-195 447 
EE eae aS 114.73 | 16.10 75 60-180 464 a 
EO a 115.54 | 15.80 71 60-180 | 496 T 
I ai ihindkhe ops she 114.72 15.90 .74 45-160 455 CC 
* September............... 116.19 | 15.40 .67 60-170 522 al 
‘ i at. sails og Sd 115.83 16.25 .78 65-160 429 is 
: NODS sca chie 6 scans ad 115.62 15.05 71 70-155 449 
5 SO i's sins sda sags 115.73 | 14.45 71 60-155 411 : 
| a 
; I ibis 5 itis v nin at 115.15 | 15.60 21 45-195 | 5586 g 
4 n 
bs. Again there seems to be nothing very definite about this comparison, n 
4 although we notice that five of the highest means fall in the warmer t. 


! months and only one in a cold month (November); whereas five of a 
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the lowest means fall in the colder months and only one in a warm 
month (July). 

Let us now combine the miscellaneous and urban cases and study 
the results for all cases divided into three social levels. Tables III, 
IV and V give the pertinent data. Glancing down the monthly means 
we cannot see any very marked and definite tendency, certainly no 
uniform rise and fall of the IQ as we run through the months of the 
year. In Table III for the low social status, we notice, however, that 
the three lowest 1Qs are for December, January and February, all 
cold months. In Table IV for the medium status, the three lowest 
IQs are in January, February, and April, again cold months. In 
Table V for the high status, the three lowest IQs are for January, 
February and March, again cold months, although June and August 
do not differ much from these cold months. Conversely, if we now . 
take the three months with the highest IQs, we have for the low status, 
May, July, September; for the medium status, May, June, October; 
for the high status, September, October, December. . Here the high 
status shows one of the cold months with a high IQ, but on the whole 
the warm months seem to predominate. 

Let us now take the largest differences in IQ between any of the 
months in each of these three tables. We have the following results: 








Differ- Sigma 
Status Highest mean Lowest mean differ- | Ratio 
erence 
ence 
iawn mune het July, 93.36| February, 91.34, 2.02 | 1.04/ 1.94 
Medium......... June, 102.56) February, 99.49} 3.07] 1.41 | 2.17 
Ee September, 116.19) January, 113.47) 2.72 .96 | 2.83 




















The highest means all fall in warm months and the lowest means in 
cold months. The differences between these means are rather small 
and are scarcely statistically reliable. The interesting thing, however, 
is the agreement at all three social levels. 

All Levels Combined.—Let us now study the monthly mean IQs for 
all our 17,502 cases by combining all three social levels. Table VI 
gives the necessary data. With about 1500 cases for each month our 
means should be fairly stable. The standard deviations of the monthly 
means are small and much the same for each month. If we study 
the means themselves we cannot see any regular tendency to increase 
and then decrease according to the seasonal changes of the year. 
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TasBLeE VI.—Megan IQ sy Monts or Bret 
All Levels Combined—All Cases 











Mean IQ SD SD mean| Range N 

PME 88 ENS 100.40 | 19.90 .50 45-155 | 1,559 
SRE Cee erent 100.35 | 19.75 .52 45-160 | 1,452 
I winds wednis and 44 er 101.25 | 20.00 51 50-160 | 1,556 
( MASRRRRRQRRR ES VIAE: Sad Gece e 102-40 | 19.95 .52 50-175 | 1,492 
Lh 6 vin nich @ ke ht noses cole 102.10 | 20.40 .54 45-195 | 1,412 
MS Bed Wa Liv w pio e Oe um ou ote 102.60 | 20.20 .54 45-180 | 1,396 
FUR LR A CE Be 102.20 | 20.30 .52 45-180 | 1,532 
NE ig oi ss cae ne oan 101.40 | 20.25 .52 45-160 | 1,518 
SP AS er 102.60 | 20.20 .52 50-170 | 1,495 
SE RI pS 102.04 | 20.30 .54 45-160 | 1,406 
kn oe 2 ole tn bee 102.25 | 20.05 .55 45-155 | 1,336 
Es a owed whee eke ee 101.20 | 19.70 . 53 45-155 | 1,356 

MIN 2. citi dais nacdeleaa 101.75 | 20.10 15 45-195 | 17,502 




















We note, however, that the three lowest means are for December, 
January, and February, whereas the three highest are for April, June 
and September. Since June and September have exactly the same 
mean IQs we have two largest differences between any two months: 





Highest mean No. | . Lowest mean No. |Difference Sigma Ratio 





difference 
June, 102.60} 1396 | February, 100.35) 1452 2.25 .75 3.01 
September, 102.60} 1495 | February, 100.35) 1452 2.25 .73 3.06 























Here, because of the large number of cases, we have a statistically 
reliable difference between the means, although the actual difference 
is only a little over two points in IQ. 

The actual size of this largest difference is certainly not impressive, 
but the fact that a difference in IQ in favor of the warmer months seems 
continually to appear is puzzling. This general tendency for the 
warmer months to have the higher IQs and the colder to have the lower 
IQs can be seen in Fig. I. This figure shows the monthly curves for 
each of the three social levels. The curves are superimposed, so that 
a direct comparison by levels is possible. Although there are many 
irregularities, the curves as a whole tend to rise in the center, where 
the warmer months are, and drop down at both ends, where the colder 











mm  — he 
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months are. There are notable irregularities. For the high social 
level November and December are above many of the warmer months. 
August for all levels is lower than it should be, being lower than the 
surrounding warm months. Whether any significance can be attached 


Mean 19 For Eacn Mearnfor Eacn Secu < eves 
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to this, it is impossible to say. We may note, however, that it does 
not drop as low as the colder months in general. January and 
February show low means for all three social levels; December is low 
for only two social levels. There is, therefore, a general similarity in 
the three large samples in spite of individual differences in each sample. 








1It may be of interest to note here that, in a previous article by the senior 
author, August had the lowest mean IQ of the six warm months, May to October. 
This was not pointed out in that article. 
See Pintner, R.: Intelligence and Month of Birth. J. Appl. Psychol., Vol. XV, 
1931, pp. 149-154. 
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Seasonal Differences.—Let us now study the effect upon our data 
by combining several months, and we shall begin with the conventional 
four seasons of the year. Table VII shows the means and standard 
deviations for the three social levels and for the total group. We 


Taste VII.—Mean IQ sy Smasons 
All Cases 





Mean IQ Standard deviations 





High — Low | Total | High - Low | Total 





£ 
Spring (April—June)..... 115.25)101 . 85/92 .65)102.35) 15.80) 18.95|18.30) 20.20 
Summer (July— Septem- 


Bs iin cicadn needs 115.55/100 . 80/93 .05)102.05) 15.70) 18.75)18.70| 20.25 
Autumn (October—De- 


























OE. oi. in i acdcnis 115 . 85/100 . 80/92. 40/101 . 85 15.30} 18.05)18.25) 20.05 
Winter (January—March)|113. 95/100. 35/91. 85)100.65) 15.45 16.80)18.60 19.95 





note immediately that the lowest IQ for all four groups falls in Winter. 
Expecting an equal uniformity for the highest mean IQs, we are dis- 
appointed. These fall in Autumn, Spring, Summer and Spring. The 
largest differences for the four groups are as follows: 








: Differ- Sigma : 
Status Highest mean | Lowest mean sang awe: Ratio 

NN iki. 5 iw Giada Summer, 93.05) Winter, 91.85) 1.20 .58 2.07 
Medium......... Spring, 101.58) Winter, 100.35) 1.50 .80 1.87 
RS RE Autumn, 115.85} Winter, 113.95) 1.90 .59 3.22 
BN ck cee Spring, 102.35) Winter, 100.65) 1.70 41 4.14 




















Two of our ratios are well above the conventional requirement for 
statistical reliability; the other two are not. The most impressive 
fact, however, is the uniformity of the lowest means in winter for 
all groups, even though the differences in the mean never attain 
two points in IQ. The total group, furthermore, shows a reliable 
difference between winter and any other season. Between winter and 
summer we have a difference of 1.40 IQ points with a ratio of the 
difference to the sigma of the difference of 3.41; between winter and 
autumn the difference is 1.25 IQ points with a ratio of 2.90. Our 
results, therefore seem to point to the depressing effect of winter, 
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rather than to the stimulating effect of any other one season. Spring, 
summer and autumn seem to be equally stimulating in contrast with 
winter, the differences between these three seasons being negligible. 
Warm and Cold Months.—We may explore further possible differ- 
ences in IQ by contrasting warm and cold months. Let us combine 
the spring and summer months on the one hand, and the autumn and 
winter months on the other. Table VIII shows the results for the 


Taste VIII.—Warm vs. Corp Montus 
All Levels—All Cases 





Mean IQ Standard deviations 
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15.75) 18.85)18.45 
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three social levels and for the total group. The lowest IQs in all cases 
are for the cold months. The differences are as follows: 








; Sigma > 
Status No. Difference Serine Ratio 

en « acy dll W oallsle 0k Ou 7,745 0.75 .40 1.87 
Rs ee it a 4,171 0.80 .57 1.40 
SS ds via nnveeaahbachan 5,586 0.55 41 1.34 
IS Soc nn shat onwwevar 17,502 1.00 .28 3.57 

















The differences for the three social levels do not rise to one point in 
IQ and none of these differences are statistically reliable. For the 
total group the difference is statistically reliable, but it is only one IQ 
point. Evidently the effect of combining seasons is to diminish 
the more marked difference found in the winter months taken by 
themselves. 

Hot and Cold Months.—Let us now disregard the conventional 
seasons, and take the six warmest months, May to October, and 
contrast them with the six coldest months, November to April. In 
the previous study by Pintner this six-month division gave the most 
reliable difference. The differencein 1Q points was 1.40. Our present 
results do not confirm this preliminary finding. Table IX gives the 


A er a 
a 





ee, 
soe mes i! 


gee — meee, al er nen 




















. 
i§ 
P 


574 The Journal of Educational Psychology 


Taste [X.—Warm vs. Corp Montus 
All Levels—aAll Cases 





Mean IQ Standard deviations 
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May-October.......... 115.40 
November—April........ 114.78 
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pertinent data. The colder months in all cases show the lower IQ, 
but the differences are very small, and none of them have any high 
degree of statistical reliability, as we can see below: 














Status No. | Difference | .™S™* Ratio 

difference 
Reece ote aan: 7,745 0.80 40 2.00 
NRE Mer ee 4.171 0.95 56 1.69 
RSE Ae gy 5.586 0.62 40 1.55 
SG aa il aa 17,502 0.85 30 2.83 











Again we note the same effect as before, namely, by combining various 
groups of six months the significantly low IQs of the three winter 
months are practically wiped out. 

Sunshine and Temperature.—The most obvious differences between 
the months of the year are those of sunshine and temperature. Blon- 
sky has argued that sunshine is the potent factor in explaining the 
differences in mean IQ for the various months of the year. We have, 
therefore, compared IQ and number of hours sunshine per month. 
The sunshine figures are means per month over a period of thirty-five 
years for New York City.' Since most of our cases are from New 
York City and vicinity, these figures are probably the best for our 
purpose. The monthly means for sunshine would, in any case, not 
differ relatively for the Middle Atlantic or Northeastern States as a 
whole. Figure II shows the two curves for sunshine and IQ. There 
is evidently some similarity between the two curves. This is par- 
ticularly true for the first six or seven months of the year. August 





1Scarr, J. H.: ‘Annual Meteorological Summary, New York, N. Y.” U. 8. 
Department of Agriculture, New York, N. Y., 1931. 
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falls too low in IQ, and November and December are much too high 
in IQ. If we rank the months for the two variables, we have a rank 
correlation between sunshine and IQ of .59. 

Although sunshine and temperature are closely allied, there is 
enough difference to warrant the comparison of IQ and monthly 
temperature. Our temperature data are again from New York City 
and they represent the monthly means for a period of forty-six years. 
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Figure III gives the two comparable curves. The curves seem to 
agree a little better than in the case of sunshine, but again we note 
the discrepancy in the case of August, November and to some extent 
December. The ranks for the months show a rank correlation of .67 
between IQ and temperature. This correlation is somewhat higher 
than the one for sunshine. If sunshine is a potent factor for infants 
during the first month of life, as Blonsky supposes, then temperature 
may determine the amount of direct sunshine to which the infant is 
exposed, and thus we have a higher correlation between temperature 
and IQ than between sunshine and IQ. 
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Australian Data.—If temperature or sunshine are directly or indi- 
directly affecting the IQ, it would be reasonable to suppose that we 
would find the same influences at work in the Antipodes. We, there- 
fore, endeavored to secure IQs and months of birth of Australian 
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Fig. 3. 





children. Through the courtesy of Dr. K. 8. Cunningham! we have 
obtained three hundred twenty-eight cases. The monthly means 
and number of cases are shown in Table X. It will be noticed at once 
that these data differ from our main data in showing much lower 
IQs. The mean for the total is seventy-five, whereas our total mean 
is one hundred two. The Australian mean of seventy-five is even 
much lower than our mean of 92.5 for our low status group. Evi- 
dently these children are mainly very retarded cases. The range, 
however shows that there is a scattering of high IQs, but these high 





1 We wish to thank Dr. K. 8. Cunningham, Executive Officer of the Australian 
Council for Educational Research, for his kindness in sending us these data. 
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Taste X.—MontTsaty Means or AvusTraLiAN CHILDREN 
Month Mean IQ SD Range N 
SE tii es enn ecnens 73.21 15.60 34-103 28 
PS Se epee 77.50 11.95 49-100 30 
SSF eo vk cemeudveseses 75.00 14.80 42— 99 28 
PS a ee ae eres eee 68.27 11.85 69- 90 26 
nn a aephety Pane ipe 75.00 16.45 55-112 23 
pS ORT: PP Gas BN 77.40 16.65 35-121 25 
Pgs “ey FS Ci! 77.09 15.30 46-109 31 
Mii lio 81.83 18.95 48-121 30 
eee ee 70.97 18.75 34-115 31 
ks cheb oe kino Bled 72.82 19.90 42-108 23 
I ob xn cu ae ou eRe 77.88 13.60 59-101 26 
SUI cic ccvcccnesvete 74.07 13.10 43— 93 27 
MU ks 6% bee dbs ps bedal 75.16 16.20 34-121 328 

















IQs are very unevenly distributed. A study of the detailed distribu- 
tion shows one IQ above one hundred ten in May, June and September, 
three in August and none in any of the other months. Our sampling 
of Australian cases is, therefore, very poor.! Looking, now, at the 
monthly means, we note the highest mean falls in August, a winter 
month. This is diametrically opposite to our results in the United 
States. The lowest mean falls in April, an autumn month. The 
three highest means are August, November, February; the three lowest 
are April, September and October. 
here, such as we found in our own data. 

The seasonal means for the Australian data are as follows: 


There seems to be no consistency 





























SD 
Seasons No. |Mean| sp | SP | aifter-| Ratio 
IQ mean 

ence 
Spring (October—-December)......... 76 | 74.99) 15.77 
Summer (January—March).......... 86 | 75.29) 14.26 
Autumn (April-June).............. 74 | 73.45] 15.57) 1.81] 4 9,14 a2 
Winter (July-September)........... 92 | 76.57| 18.36] 1.91 | ~" 
Here we note that the means are all very much alike. None of the 


differences are large or statistically veliable. 





The largest difference 


1 We are now trying to get a more representative sampling of cases from the 
southern hemisphere. 
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of 3.12 IQ points is between autumn and winter. Winter, however, 
has the largest mean. 


The comparison between the warm and cold months is as follows: 








SD 
Months No. | Me*"! gp | SP | aitter-| Ratio 
IQ mean 
| ence 
Warm (October—-May).............. 162 | 75.15) 14.99) 1.18 1.78 | 0.02 
Cold (April-September)............ 166 | 75.18] 17.24) 1.34 | °° 























Here the mean difference in IQ points drops to 0.03. 

We may sum up the Australian data by saying that our attempt 
to find the same tendencies at work in the southern hemisphere has so 
far been unsuccessful. The one consistent finding in the U. S. data, 
namely the low IQ in the winter months, is reversed in the Australian 
data, where we find the highest IQ in the winter months. However, 
none of the differences in the Australian data are statistically reliable. 
The number of cases is very small and the sampling is probably not 
representative of any particular portion of the population. 

Infant Mortality.—Disease or poor health of the mother or child 
during early life might conceivably so injure the infant as to arrest 
or retard the general development and so lead to lowered intelligence. 
We have, therefore, studied the mortality statistics, since periods of 
greater mortality will coincide with periods of severe illness among the 
survivors. The census mortality tables give us the monthly fluctua- 
tions of the death rate for the total population and also for infants. 
The annual death rates for the total population are in general higher 
for the winter months. In the Thirteenth Annual Report of the 
Bureau of Census! we find monthly death rates for the years, 1929, 
1928, 1920, 1910, 1900. We have averaged these five annual rates 
and find that January, February and March show the highest monthly 
death rates. The rank correlation of monthly death rates with our 
mean monthly IQs is +.67. 

The infant (children under one year of age) mortality rate by month 
fluctuates somewhat from year to year. We have calculated the 
average monthly infant mortality rate for the nine years, 1921 to 1929. 
These figures show the highest death rates in January, February and 





1 “ Bureau of Census, Mortality Statistics, 1929.’ Thirteenth Annual Report. 
U. S. Dept. of Commerce, Washington, 1932, p. 6. 
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March and the lowest rates in June, July and August. If we rank the 
months from lowest to highest mortality rate and correlate these ranks 
with the ranking according to IQ from highest to lowest for our data, 
we have a rank correlation of +.66. The highest mortality occurs 
in the months of lowest IQ. The months in which deaths, and hence 
also sickness, are most prevalent are also the months which show the 
lowest [Qs for our data. 

This comparison, however, tells us nothing about the liability to 
death and disease of the child according to his month of birth. Chil- 
dren born in January might not show higher mortality rates than those 
born in June or July, although it would still remain true that the 
mothers of children suffered a higher mortality rate in the winter 
months. We have, therefore, taken the neo-natal (first month of 
life) mortality rates for one year (1928). These months of death will 
coincide much more closely with the month of birth of the infant, 
although there will be an overlapping of one month on the other in 
every case. The rank correlation by months for neo-natal mortality 
(1928) between month of death and our IQs by month of birth is 
+.30. For the year in question the highest neo-natal mortality rates 
fell in March, April and December, and the lowest in July, August and 
September. The indication here seems to be that to some extent the 
infants born during the colder months are more subject to death, and 
hence also illness, than those born during the warmer months. 

All of these comparisons are not very satisfactory. What we need 
is the mortality rate according to the month of birth of the infant, not 
according to the month of death as given in the census publications. 
We have not been able to find such an analysis for the country at large, 
but only for certain cities for a short period of time. Woodbury’s! 
data have been the most helpful. He makes an analysis of 22,967 
live and 813 still births in eight American cities. Special reports by 
Dempsey? for the city of Brockton and by Rochester* for the city of 
Baltimore have also been used. If we correlate the month of birth by 
IQ with the infant mortality rate by month of birth of the infant, we 
obtain rank correlations of —.28, —.22 and +.12. There seems little 





1 Woodbury, R. M.: “Causal Factors in Infant Mortality.” Children’s Bureau 
Publ. No. 142. U.S. Dept. of Labor, Washington, 1925, pp. 245. 

2 Dempsey, M. V.: “Infant Mortality in Brockton, Mass.” Children’s Bur. 
Publ. No. 37. U.S. Dept. of Labor, Washington, 1919. 

* Rochester, A.: ‘Infant Mortality in Baltimore.” Children’s Bur. Publ. No. - 
119. U.S. Dept. of Labor, Washington. 





























== BRE rs in EE SON 





eg kas oe ET Aeies o* 


Tee et Ee 


LR a RE PIPE OE. 





580 The Journal of Educational Psychology 


agreement here, owing to the violent fluctuations of the mortality rate 
from month to month. The two months showing the highest rates 
for each of the three sets of data are: April and June; March and April; 
May and August. The months with the lowest rates are: August and 
October; August and September; January and December. Obviously 
we need more data by month of birth before we can come to any 
conclusion. 

The mortality rate during the first year of life (infant mortality) 
may not be so significant as the mortality rate during the first month 
of life (neo-natal mortality). The influences affecting the child which 
lead to a lowered IQ in later life may be more potent at or shortly after 
birth. Woodbury’s analysis of death rates by month of birth for 
various ages of the infant during the first year of life is the only such 
analysis we have been able to find. He gives the monthly death rates 
according to the month of birth of the infant and according to the 
month of life of the infant from the first to the twelfth month. There 
are two important things for us to note in this table. First the tre- 
mendous difference in death rate between the first month of life and 
all the others. The death rate for the first month of life is five times 
greater than that for the second month and about ten times greater 
than that for the eleventh or twelfth month. The first month of life 
is a precarious month for the infant. If he overcomes all the obstacles 
facing him at that time, he has relatively plain sailing thereafter. 
With each month of life from the second to the twelfth the road grad- 
ually becomes easier; the second month being only twice as dangerous 
as the twelfth month. The second noticeable feature of Woodbury’s 
table is the fluctuations of rates for any month of birth as the infant 
grows during the first year of life. For example, in the first month of 
life children born in October are the most favored; their mortality 
rate is the lowest; but during the second month of life children born in 
May are the most favored. Those who suffer most during the first 
month of life are the children born in March, April and January (two 
of these are the winter months where the mean IQ is lowest). We 
have picked out the two most favorable and the two least favorable 
birth-months for each of the twelve months of life from Woodbury’s 
table shown on p. 581. 

We see here great fluctuation of birth months. The same birth- 
month at times is among the most favorable for the infant and then 
swings to the opposite extreme and becomes the least favorable. 
Almost all months appear at some time or other in both columns. 
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Among the least favorable birth-months we find January appear- 
ing five times. No other month appears so often. No other month 
appears more than twice. And January is an unfavorable birth- 
month during the first two months of life, when the mortality rates 





are the highest. Among the most favorable birth-months we note 

















that December appears most frequently, namely four times. But 
: Months with lowest | Months with highest 

Month of life death rates death rates Rho 
WS 8s AUK ASR October, December | January, March + .24 
ies feck sie ee April, May January, June —.14 
, Ey CERT March, August May, July — .34 
rds scbacichwed nmenn January, February | April, May — .60 
aap ar Ae eat MR July, December March, April —.19 
GRUME ete es hc keawetues July, November January, February + .62 
pee iis ee ae a May, November January, February + .63 
) SPT ee April, October January, December + .48 
Pia ried 4/0 6c aebidncte February, March November, December | — .26 
Td kon dniecatauen 4 January, August September, November | — .73 
Da 3 0cscnween tecnnee February, December | May, October — .56 
WO 6 Selo icbivetedcete January, December | April, September — .40 








next come January and February, three times each. However Janu- 
ary appears twice toward the bottom of the list, during the tenth and 
twelfth months of life, when the mortality rates are lowest. 

In the fourth column we give the rank correlations between the 
months of birth for each month of life arranged according to the mor- 
tality rate from lowest to highest and the months of birth ranked 
according to mean IQ from highest to lowest. We note how these 
correlations fluctuate back and forth from positive to negative, reflect- 
ing the tremendous changes in infant mortality during the first twelve 
months of life as shown in the Woodbury table. The correlation of 
+.24 for the first month of life is much the same as that of +.30 which 
we obtained, as explained above, from the census data for month of 
death for the first month of life. These results, if they should be borne 
out by more extensive data, point to the suggestion that the first month 
of life is most crucial so far as birth month is concerned. The highest 
positive correlations occur during the sixth, seventh and eighth months 
of life. At this period January, February and December are the least 
favorable birth-months. Children born in these months would be six 
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to eight months of age in the months of June, July and August, when 
diarrhea and enteritis are the most common causes of infant mortality. 
The highest negative correlation, —.73, occurs when the child is ten 
months old, and the unfavorable birth-months are September and 
November. Children born in these months would be ten months old 
in July and September, again two months in which deaths from 
diarrhea and enteritis are high. During the first month of life deaths 
from diarrhea and enteritis are relatively much less frequent than 
during later months of life. The great causes of death among neo- 
nates are premature birth and injury at birth, and both these causes of 
infant mortality appear to occur more frequently during the winter 
months. 

This excursion into the realm of infant mortality has not been 
very satisfactory, but it has been suggestive. It has not been satis- 
factory, because we have not been able to find extensive data classified 
according to the birth-month of the infant. It has been suggestive, 
however, in indicating a probable connection between IQ and months 
of greatest mortality, and so presumably, morbidity. Children, who 
are born of mothers more than ordinarily weighted with sickness, and 
who, during the first month of life are subjected to greater amounts of 
sickness, may suffer a general impairment or lowering of the whole 
constitution which is reflected in later life in a lowered 1Q. Such 
seems to be the case with children born in the winter months to a 
greater extent than with children born during the other months of the 
year. Our suggestion here of the effects of early illness needs to be 
investigated. Smith’s' study of this topic agrees with our suggestion 
so far as it goes. 

Summary.—Although there are many references in the literature 
to the possible influence of birth-month on mental ability, eminence? 
and other mental factors, the data presented so are very scanty and 
very unsatisfactory. No clear conclusion emerges. Our present 
attempt has been to gather more comprehensive data than have been 
previously presented. To this end we have tabulated 17,502 IQs. 
The distribution of our cases according to month of birth is very like 
that of births in general in the United States. Our cases have been 





1 Smith, 8.: Influence of Illness during the First Two Years on Infant Develop- 
ment. J. Genet. Psychol., vol. XXXIX, 1931, pp. 284-287. 

2 We are making a study of the birth-month of eminent men to see if this 
approach substantiates the findings of the present study. 


<_< 


at. ai 








ee ee ee | 


Influence of Month of Birth on Intelligence Quotients 583 


further divided into three social levels and into an urban (New York 
City) and a miscellaneous group. __ 

A study of these data by months, seasons and by various groups 
of months, shows consistently a lowest mean IQ for the winter months 
(January to March). This is true for each social level and, of course, 
for the total group. ‘The mean difference in IQ between winter and 
the highest seasonal mean is 1.70, and it is statistically reliable. This 
difference is small and would be insignificant, were it not for the per- 
sistency with which it continually appears. 

When we turn to a study of the highest seasonal means, we find no 
consistency in our data. Each social level has a different season for 
its highest mean. For the total group the highest mean is for Spring. 
In this respect our data agree with Blonsky’s data, but because Spring 
is not the highest for each social level, we are inclined to lay no stress 
on this result. Our chief finding is the low mean for winter. 

Monthly differences in mean IQ are very small and one can only 
get statistically reliable differences by comparing the highest and low- 
est monthly means. These largest differences are very small, from 
2.02 to 3.07 IQ points. 

When we take two different comparisons between six cold and six 
hot or warm months, all our differences are reduced. This again 
seems to us to emphasize the importance of our low mean for Winter, 
and the lack of difference in IQ between the other three seasons. 

Ranking the months in IQ and in sunshine and temperature, we 
find correlations of +.59 for sunshine and +.67 for temperature. 
We are very skeptical of the conclusion which Blonsky suggests, that 
this positive relationship is due to the direct influence of sunshine or 
temperature on the new-born infant. We have, therefore, looked for 
other possible influences which also show seasonal fluctuation. Gen- 
eral mortality and infant mortality are such possible influences. Here 
we find winter the season of highest general mortality and presumably 
also general morbidity. Child-birth during this season of the year 
would then be more than ordinarily precarious. Children born then 
might be less healthy. As to infant mortality itself, we have ample 
statistics according to month of death but few that we could find 
according to month of birth. What we have found is a positive 
correlation between monthly IQ and neo-natal mortality (rho = .30), 
and various correlations positive and negative between monthly IQ 
and death rate by month of birth. A positive correlation by month 
of birth for the neo-natal period seems to us most significant (rho = 
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+-.24), because it is during this period that the death rate is heaviest. 

Those children born in winter who survive are more likely to be 

impaired. Children born in winter suffer more illness and are born A 
of mothers weighted with more illness. This is our suggested explana- 

tion for the lower IQ found in winter. Much needs to be done to 

strengthen this conclusion. 
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A PROCEDURE FOR BALANCING PARALLEL GROUPS 


PHILLIP J. RULON AND CHARLOTTE W. CROON 
Harvard University 


The balancing of groups for experimental purposes may be accom- 
plished either by pairing individuals from the two groups or by pairing 
the two groups as a whole. When the latter method is used, it is 
usualiy desirable that the two groups be paired not only as to mean 
score in the variable under consideration, but also as to variability 
among scores in the groups. 

The simple procedure of discarding high or low scores from one 
group to bring its mean into coincidence with that of the other group 
restricts the dispersion among the scores in the diminished distribution 
and so may produce an imbalance between the two groups in respect 
to variability. The process of constantly recomputing means and 
standard deviations in order to check on the nicety of the intergroup 
balance may be dispensed with if a procedure is employed which will 
bring both distributions into concurrence with a prescription desig- 
nated in advance. The second and fourth columns of Table I give as 
an example the distributions of Terman raw scores obtained by two 
groups of ninth grade students. Since there are more cases in one 
distribution than in the other, it would seem reasonable to drop cases 
from the larger in order to match it with the smaller. But to change 
the mean of the larger group by an amount sufficient to bring it into 
coincidence with that of the smaller would cost more cases than would 
be required to shift both means to some intermediate point. 

The intermediate point may be decided upon by inverse weighting 
of the two group means. Multiplying each mean by the number of 
cases in the other group, adding the two products, and dividing by the 
number of cases in both groups combined, gives 106.68 as a prescribed 
mean to which both distributions in Table I may be made to conform. 
This value differs from the mean of the larger group by 5.32 points and 
from that of the smaller group by only 3.43 points, so the larger group 
will be changed more than the smaller one by the equalizing process. 
Since generally more cases are lost in making larger changes in the 
mean, this is a desirable arrangement. 
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TaBLe I.—DistrisutTions oF TERMAN Raw Scores ror Two Groups 


























Frequencies 
& Class limits Group A Group B 
Hs Originally (To be removed!) Originally | To be removed 
% 180-4 lia Mak ta 5 
5 175-9 aT Vee 6 1 
} 170-4 | TERRIA HOLES? aetna 4 1 
is 165-9 _ RRR: betters 5 1 
ai 160-4 pats Bee he oe 3 2 
155-9 ee Sa es SO 3 2 
150-4 _ sae ate see pe 9 3 
145-9 RRS. Sean ee ai 10 4 
140-4 SOR eee ee 12 5 
135-9 TEES ahaha 15 4 
130—4 ee fo). ease a 21 3 
125-9 Bee teh RS 16 2 
120—4 ees Part oe Ged oat 20 2 
115-9 ME. eee 25 1 
110-4 SS Dees 16 1 
105-9 ew Cn a 21 1 
100—4 40 2 25 
95-9 27 6 19 
90-4 36 13 16 
85-9 35 . 23 13 
80-4 27 26 16 
75-9 34 23 11 
70-4 20 13 16 
65-9 21 6 1l 
60-4 18 2 9 
55-9 ees 7 
50-4 IS ee 5 
45-9 Amelie. ii ae a 3 
40-4 eo a eet kes cee 0 
N 530 114 342 33 
M 101.36 82 110.11 142 
o 28.72 8.53 30.83 16.53 








ation as well. 


1 These entries come from Table II. 
The same inverse weighting process applied to the two standard 
deviations! gives 30.00 as a prescribed standard deviation for both 
distributions after matching. 


1If preferred, the prescribed variability may be measured in terms of the 
variance. In either case the groups will finally be balanced as to standard devi- 
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Now considering either distribution separately, a new distribution 
of mean 106.68 and standard deviation 30.00 is to be obtained by 
removing cases from the original distribution. Let the subscript O 
refer to the original distribution; let R refer to the distribution of cases 
to be removed, and let F designate the final distribution. Then 


NoMo — NeMzg = NrMy = (No we Nz) My, 








and algebraic transposition gives 
No(Mo — Mr) 
Ne M.—M, (1) 
and 
M, = So(Mo — Mr) + u, (2) 
R 


where N; is the number of cases to be removed and Mz, is the mean 
score for those cases. 


Concerning the standard deviations of the original, removed, and 
final distributions, we have 


me rX Fr’ a (2Xr)? 
Nr (Nr)? 


where Xp? = 2X_? — TXz?, and all X’s are raw scores. 
Since 





Or? 





then 
DX zp? = N2(M;? + or”) 
and 
2 — 2Xo* — Na(Ma* +n") _ (2Xr)? 
1 Nr Ny? 





Algebraic transposition finally reduces this, when solved for c,?, to 


oR? = 2X0? nes (No _— Ne)or* “ M;? as (2Xo? ans NeM;)? 


Ne 








No-NijN, @ 


which indicates the (square of the) standard deviation of the cases to 
be removed. 
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The procedure in bringing Group A of Table I into coincidence 
with the prescriptions laid down is as follows. The mean of this group 
is 101.36 and is to be raised to 106.68, so cases removed must be below 
the present mean. The farther they are below, the fewer cases will 
need to be removed to raise the mean of those remaining to 106.68. 
The class indexes in Table I are 42, 47, 52, etc. Trying Mz = 77 and 
solving (1) for Nz yields Nz = 95, showing that 95 cases whose mean 
is 77 will, when removed, leave the remaining distribution with the 
required mean. But substitution in equation (3) yields cg? = —9.70, 
which is impossible, as it is of minus sign. This shows that 95 cases 


TaBie II.—Frequency TasLe ror Cases To BE REMOVED FROM GRouP A 











Lower limit Percentage of Number of 
; o value 
of category cases cases 
100 2.08 1.9 2 
95 1.49 4.9 6 
90 .89 11.9 13 
85 .29 19.9 23 
80 — .29 22.8 26 
75 — .89 19.9 23 
70 —1.49 11.9 13 
65 —2.08 4.9 6 
60 —2.68 1.9 2 
NS. so sb eR” Slab ae 100.0 114 














whose mean is 77 cannot be selected such that their removal will 
leave cases of the variability prescribed: ¢ = 30.00. This holds for 
all lower values for Mz as well. 

Setting Mz = 82, the value of the next higher class index, gives 
for formula (1) Ng = 114. Substitution of this value in formula (3) 
gives op? = 72.76, cg = 8.53. That is, the removal of 114 cases whose 
mean is 82 and whose g is 8.53 will leave a distribution whose mean 
and o are 106.68 and 30.00 respectively. It remains only to remove 
the cases. 

For this purpose a frequency table may be constructed which, 
when filled with cases, will represent those removed from the dis- 
tribution. Such a table may be made up from the constants of the 
normal curve, or from those of some other, perhaps skewed, curve,! 





1 Pearson’s Type III Curve is tabled for various degrees of skewness by L. R. 
Salvosa in Vol. I, No. 2, of the Annals of Mathematical Statistics. 
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as the data in the original distribution may indicate. Using the nor- 
mal curve, the distribution to be removed in our illustrative problem 
is given in Table II. The columns in this table are filled in con- 
secutively. The first column carries the lower limits used in the 
original distribution. The entries in the second (¢ value) column 
indicate how many times 8.53 the lower limits of the various classes 
are away from 82, the mean of the distribution. The entries in the 
third column come from a table of the normal curve and indicate what 
percentage of the 114 cases should be in each of the categories, and 
in the last column these percentages are converted to numbers of cases. 
The frequencies of this last column have been entered in the third 
column of Table I. 

Removing the cases indicated leaves a Group A distribution in 
which Ny = 416, My = 106.66, and cr = 30.01. These values for 
the mean and standard deviation vary from those prescribed only 
in the second decimal places. 

The application of the procedure to Group B of Table I follows 
similar steps. Since the mean of this group is to be lowered, the cases 
must be removed from the upper end of the group. Trying Me at 
167 in equation (1) gives Neg = 19. But equation (3) shows that 19 
will not do for Neg. Trying Mz = 147 gives Neg = 29 and og = 3.21. 
This value of cg is too small for practical use. To take out 29 cases 
whose ¢ is as small as 3.21 and whose mean is 147 is not permitted by 
the distribution of Group B scores (see Table I). Therefore Mz = 142 
is tried. This yields Ng = 33 and og = 16.53. 

The frequency table following these three requirements is now set 
up as above for Group A, and the 33 cases removed. These appear 
in the last column of Table I. The remaining cases fall thus: Ny = 
309, My = 106.71, or = 30.05. Finally, therefore, the characteristics 
of the two matched distributions are: In number of cases, 416 and 309; 
in mean scores, 106.7 and 106.7; in standard deviations, 30.0 and 30.0. 

The procedure is particularly economical of time when more than 
two groups are to be equated, or more than one variable is involved, 
or both. 

When more than two groups are to be equated, the prescribed M, 
may still be determined by inverse weighting. If the numbers of 
cases in the groups are Ny, Nz, Ne, etc., and the sum of all these js 
Nr, then the prescribed mean may be set at the arithmetic mean of all 
the products M4(Nr — Na), Ms(Nr — Nz), Mc(Nr — Nc), ete. The 


—— 
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prescribed oy may be similarly fixed. The procedure then involves 
simply altering each distribution so that it conforms to M, and oy. 

When more than one variable is involved in the matching, the 
number of cases to be removed from each distribution must be chosen 
so that it will be suitable for all variables. And cases must be removed 
which will fit simultaneously into removal distributions for all 
variables. 














THE FACTOR THEORY AND ITS TROUBLES: 
Ill. MISREPRESENTATION OF THE THEORY 


C. SPEARMAN 


1. THE THEORY AS FORMULATED 


The present article is the third of a series dealing with the diffi- 
culties and objections which have arisen in comparing the theory of 
Two Factors with the results of actual observation. The first article 
showed how such comparisons have often been vitiated by misusage of 
the ‘‘probable errors.”” The second indicated some fallacious side- 
tracking of attention from the better evidence to the worse. The 
present endeavour will be to set forth how the theory itself has been 
fatally misrepresented. Of all the difficulties besetting the theory, 
this third one has been by far the most troublesome. 

The foundation pillars upon which the theory has been built have 
been expounded upon many occasions. See, for instance, an account 
by the writer in the “Psychologies of 1930,” Clark Univ. Press, 1930. 
In greatest brevity they may be enumerated as follows. First come 
the correlations between more or less numerous tests of ability. Then 
follows the observation that these correlations tend toward certain 
characteristic features (such as “hierarchy” and “zero tetrads’’). 
Next is required a more correct valuation of these tendencies by making 
allowance for the sampling errors. Fourthly comes the mathematical 
proof that from these tendencies of the correlations we are able to 
build up a doctrine concerning the constitution of the correlated 
abilities. 

Now, this theory it is—and in particular the ensuing doctrine— 
which we here maintain to have been fundamentally misrepresented. 
And in order to preclude all suspicion that we are now in any way 
shifting our ground, we will set forth the exact words that were used 
originally. This occurred as long ago as the years 1904 and 1906; 
that is to say, over a quarter of a century ago. Already at that remote 
period, the general doctrine was expressed in two main theorems, of 
which the first runs as follows. 


(1) All examinations in the different sensory school, or other specific intellectual 
faculties may be regarded as so many independently obtained estimates of the one 
great common Intellective Function.! 





1 American Journ. Psychology, 1904, p. 272. 
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A page or two later in the same publication the same finding was 
repeated in slightly different words. And approximately synonymous 
versions have been frequently repeated ever since. 

This theorem (1), be it observed, concerns itself primarily with only 
one of the “two” factors; namely, “‘the universal’’ one, which has been 
symbolized by the letter ‘‘g.”” It has in view only the simple case 
where the correlation observed between the functions is due to “g”’ 
exclusively ; in other respects than g, the functions are here taken to be 
mutually uncorrelated. 

But such a simple case could not conceivably be the sole one pos- 
sible. Hence the preceding theorem (1) was on the very same page 
followed by another one which attached to it a vital reservation. The 
actual words of this second theorem were: 


(2) Though the range of this central Function appears so universal, and that of 
the specific functions so vanishingly minute, the latter must not be supposed to be 
altogether non-existent. We can always come upon them eventually, if we 
sufficiently narrow our field of view and consider branches of activity closely 
enough resembling one another. When, for instance, in this same preparatory 
school we take on the one side Latin translation with Latin grammar and on the 
other side French prose with French dictation, then our formula’ gives a new 
result. The two common elements by no means coincide completely this time, 
but only to the extent of seventy-four per cent; so that in the remaining twenty-six 


per cent each pair must possess a community purely specific and unshared by the 
other pair.? 


As will be seen, this second theorem concerned itself not so much with 
the universal as with the specific factor. It brought into play what was 
afterwards much discussed as ‘‘overlap”’ and what essentially consisted 
in correlation over and above that due to the universal factor. With 





1¥For instance, ‘‘all branches of intellectual activity have in common one 
fundamental function (or group of functions), whereas the remaining or specific 
elements seem in every case to be wholly different from that in all the others.” 
Ibidem, p. 284. 

It is interesting to note that already here the reservation was made that the 
general factor need not necessarily be anything simple, but might consist of a 
“group” of functions. 

? This original formula was substantially the same as the much later criterion 
of “‘zero tetrads.”’ 

’ See ibidem, p. 272. The said ‘‘community purely specific’”’ derives, of course, 
from a double overlapping; on the one hand, between the specific factors in the 
two Latin abilities, and on the other hand, between the specific factors in the two 
French ones. 
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this extra correlation we pass from the preceding simple case to the 
complex one which involves what have been called “group factors. ”’ 

Two years later this fact, that even the non-universal factor pos- 
sesses a more or less extended range and consequent susceptibility to 
overlap, was pursued on to new territory. For whereas the previous 
examples had been taken from school records, further cases were now 
found and demonstrated in the region of psychological experiment. 
Thus, some common or “overlapping” element or elements (over and 
above the universal factor) were disclosed between the operation of 
counting letters one at a time and that of doing so three at a time. 
Another discovered example was more important still; for it showed 
that—in certain cases at any rate—the overlapping range of the non- 
universal factors need not be so “vanishingly minute” as had been sup- 
posed in (2). On the contrary, it was now revealed that: 


(2a) A rather large group of activities might be sufficiently akin to be brought 
together as a more or less unitary power.! 


Factors having a range so much more extended were entitled “broad”’ 
ones; from this early date onwards they—and “range” in general— 
supplied our school with its chief topic of investigation. 

With the preceding theorems (1), (2), and (2a), the theory of 
Two Factors was completely outlined. By (1) the universal factor was 
announced. By (2) and (2a), the range and overlapping due to the 
other and non-universal factor was indicated. No claim shall here be 
set up that in this original publication about thirty years ago every 
word was the best possible. We should, for instance, have done better 
—as became obvious enough from the controversy that afterwards 
arose—to replace in (1) “the different” by “sufficiently different’’ 
and the “‘independently obtained’”’ by ‘‘approximately independent.” 
Still we do venture to suggest that even without such emendations the 
three theorems (1), (2), and (2a) taken in conjunction with each other 
were tolerably lucid and complete. 

Any way, whatever may be thought of the theory as formulated in 
that aboriginal publication and at that remote period, no reasonable 
doubt would appear to have been left by all our work and publications 
ever since. Our very numerous researches had obviously as their 
main aim to demonstrate and illuminate the more complex case of 
overlap or “group factors” as indicated by (2) and (2a). 





1 Zeitschr. f. Psychologie, Vol. XLIV, 1906, p. 103. 
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To crown all and exclude, it was thought, all possibility whatever of 
misunderstanding, the two cases, with and without overlap or group 
factors, were symbolized and contrasted as follows. 

s. In Fig. 1, the circle marked g is the universal or “central’”’ factor, 
whilst the circle marked s, is the non-universal or specific factor; 
conjointly, they represent the (vertically shaded) activity or ability a. 
Similarly, g and s, conjointly represent the (horizontally shaded) 
ability b; g and s., the ability c; g and sa, the ability d. In this figure, 


8, and 8, do overlap. 
Ss. and 8, do not overlap. 








Fie. 1. Fig. 2. 


the independence of the s’s from each other and from the g is indicated 
by the fact that none of these overlap each other. But on turning 
to Fig. 2, here a and b have in common, not only the g, but also a por- 
tion of their respective s’s. This common portion—indicated by the 
double shading—constitutes the overlap or group factor. 


2. MISAPPREHENSION OF THEORY BY CRITICS 


Anything plainer that the preceding figures would seem hard to 
conceive. Accordingly, it is disappointing when we meet authors still 
representing the theory of Two Factors as exclusively consisting of the 
simple case expressed by (1) and Fig. 1, and as having no reference to, 
or even as being contradicted by, the more complicated case of group 
factors expressed in (2), (2a) and Fig. 2. 

For example, Paterson and his collaborators, on finding the tetrad 
criterion disturbed by a broad factor of ‘‘mechanical ability,” con- 
cluded that through this finding the theory of Two Factors was upset. 











ball So 
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In truth, not only was their finding in the best possible accord with this 
theory—as was afterwards demonstrated in detail by Edgerton'—but 
it had been specifically anticipated by its proponents; Paterson’s work 
appears to have produced nothing that had been shown before by Cox. 

As much may be said of all other attempted upsets of this kind; 
for instance, those that have been based on .the discovery of factors 
ascribed to memory, verbality, and so forth. In every such case, 
the alleged exception to the Two Factor theory was not only in real 
accordance with this theory, but even had already been discovered by 
its advocates themselves. 

But the most extreme attitude on this score appears to have been 
taken up by Tryon. For this writer has devoted an entire article to 
marshalling ten studies in which to compare the values observed with 
those which, he says, “‘the two factors expect.”’ And this expectation 
he takes to be exclusively directed to the simple case given in (1) and 
Fig. 1. He conveys no indication that the theory with even greater 
confidence “‘expects’”’ the case given in (2), (2a) and Fig. 2. 

With publications of this calibre may be contrasted those of other 
authors, who were indeed misled by the somewhat obscure wording of 
tne original formulation of the theory, but who afterwards, on the 
obscurity being cleared up, frankly submitted the whole matter to 
new consideration and investigation. In every such case, so far as I 
know, the verdict eventually became as favorable as it previously had 
been the reverse. Pre-eminently authoritative stands here the work 
of Dr. William Brown. On re-examining the very experiments which 
he had formerly—and rightly enough from his older interpretation of 
our words—taken as adverse to the theory, he recently wrote as 
follows. 

The results, as far as numbers allow, do support the existence of a central 
intellective factor (g), and, when taken in relation with the large body of similar 


evidence accumulated during the last twenty years by Prof. Spearman and its 
students, help to give a solid basis to that theory.” 


And now finally, this verdict reached through revision of his very old 
work has received its most exact verification from the work which he 
has just accomplished and which stands out as a landmark in the whole 
history of the topic.* 

1See the Proceedings of the International Congress of Psychology, Copen- 
hagen, 1932. 


2 Brit. Journ. Psych., 1932. 
’ His new work is published in the Brit. J. Psycholo., 1933. 
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3. LOCATION OF THE SIMPLE CASE (“HIERARCHY’’) 


So far, we have been urging that the theory of Two Factors 
embraces and has always embraced, not only the simple case of Fig. 1 
(where “hierarchy” exists and the tetrads are zero), but also the more 
complicated case of Fig. 2 (where the hierarchy fails and the tetrads are 











2 The Abilities of Man, by present writer, 1927, ch. X. 


not zero). Weseem entitled, however, to go further still and to main- lap 
tain that the theory would not necessarily be annulled even if in actual wit 
practice the situation presented by Fig. 1 were always to be replaced sari 
by that of Fig. 2. For even in this more complicated case, the correla- - 
tions between abilities might still be best explained by the hypothesis 
that each ability is made up of two factors, the one universal, and the _ 
other not so. ys 
But if this were to happen, then no doubt the theory would at any pe! 
rate forfeit much of its usefulness. For by employing the simple case = 
as basis of operations, an easy and fruitful access is afforded to the fre 
complex case. Whereas if the latter had to be approached in any direct ale 
manner, the analysis into factors would suffer gravely from indeter- la} 
minateness.! For this reason some instances of the simple case, if not = 
indispensable, are at least highly desirable. Accordingly, our school D 
has always assiduously striven to ascertain whether this simple case ad 
really occurs; and if so, where exactly it is located. 
Now, that it does occur with at any rate rough approximation would si 
appear to have been sufficiently demonstrated a long time ago. pe 
Already by 1914, there had been accumulated the work of ‘14 experi- “ 
menters and 1463 men and women, boys and girls, sane and insane.” di 
‘it came impartially from strong supporters of each doctrine then 
prevalent on the matter.” As result, ‘‘Out of the several hundreds of 
different abilities thus arrayed, only three failed to conform with the 
criterion. ’’? b 
Subsequently, indeed, the more complicated case has presented e 
itself relatively much more often. But for this there was an early re 
explanation. All the later experimenters—alike whether aiming to C 
support or to oppose the theory—tended more and more to include I 
the complex case of Fig. 2 on purpose. For most of theirinvestigations “ 
. were expressly planned so as to demarcate the all-important line sepa- 
a rating this case from the simple one. Hence, the sets of abilities for b 
ac t 
' 4 1 See Holzinger. and Swineford: J. Educ. Psychol., 1932, p. 247. : 
we 
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experiment were usually so put together as to afford cases of either 
kind. 

There has been, however, a second and still more effective reason 
why the complex case, that of overlap, has grown in relative frequency, 
and indeed will always continue to do so. To discover that no over- 
lap exists means really nothing more than that there is no overlap 
within the limits of the experimental error. As time goes on, these limits 
are always being reduced, so as to render manifest further overlap of 
smaller amount or more subtle source.! 

Owing to this improvement in method, the whole matter has 
recently made a vital advance. As so often happens in other sciences, 
so here too what had at first seemed simple proved on further and more 
penetrating investigation to be compound. In one specially important 
instance, correlation which had previously been taken to derive purely 
from the universal factor as indicated in Fig. 1 turned out to contain 
also much that came from an overlapping as shown in Fig. 2; the over- 
lapping content appeared to be of a “verbal” nature. The first definite 


experimental data pointing this way were, it seems, those obtained by . 


Davey.? But the earliest conclusive observations and also the most 
adequate appreciation of their bearings must be credited to Stephenson.*® 

From these and subsequent investigations, it would appear that in 
order to get the case of Fig. 1 in its greatest purity, almost all the 
current tests of ‘‘general intelligence’ must be given up. We must 
turn instead to those mental activities which are based on the primary 
data supplied by our senses. 


4. SETS OF ABILITIES v8. GENERAL SPHERE OF ABILITY 


At this point we may do well to clear up some concepts that are 
basal to the present topic and have shown themselves to be dangerously 
equivocal. They may perhaps most conveniently be considered with 
reference to a letter kindly sent to me by that judicious authority, Dr. 
Otis. Quoting the passage given here as a footnote, he asks whether 
I would not be willing to replace the words “in every case’ by “‘in 





1 Even this limitation to the rigor of theorem (1) was foreseen from the very 
beginning. All the conclusions drawn were explicitly declared to be subject to 
their “inevitable eventual corrections and limitations.” Am. J. Psychol., Vol. 
XV, 1904. 

? Brit. J. Psychol., Vol. XVII, 1926. 

3 J. Educ. Psychol., 1932. 
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many cases.”’ Such a substitution, he thinks, would leave scope for 
the possibility that: 


two or more intellectual activities may have in common one or more “‘group 
factors’? which are not general.! 


Then he proceeds to develop his theme in the following interesting 
fashion. 


If your answer to my question is yes, . . . then I would like to ask this ques- 
tion: “‘Is there any psychologist who would now dispute that statement? In other 
words, are we not now all in agreement that there is a general factor common to 
all intellectual activities, that there are many factors each specific to one and only 
one ability, and that there are still other factors called group factors that are 
common to some but not to all of these?” Ibidem. 


Now this substitution proposed by Otis in the formulation of the 
theory does not perhaps convey quite as definite information as 
might be desirable. But so far as it goes, there appears to me 
nothing wrong about it. When, however, he goes on to set forth his 
three categories of factors—general, specific, and group, respectively 
—he brings us up against a matter of great importance though com- 
monly overlooked. This is the distinction between two widely 
different kinds of propositions. The one kind deals with some particular 
set of abilities (such, for instance, as the test series of Binet, or that of 
Thorndike, or that of Otis himself); the other kind treats of ability 
asa whole. Now, as regards any particular set of abilities, the factors 
of Otis present no difficulty; they obvious do admit of classification 
under his three categories; for every ability in the set must needs be, 
either common to all the set, or confined to one only in it, or else 
shared by less than all but more than one. But let us turn to the other 
and far more important kind of proposition, that which deals not with 
any particular set but with ability as a whole. Here we can no longer 
characterize a factor as belonging to one of the three categories. For 
any factor may upon occasion be general in the sense of being common 
to all the abilities in some set; any (except the universal one) may be 
confined to one ability only in a set; and so too any (except the uni- 
versal one) may be shared by less than all but more than one. The 
triple classification thus becomes futile (unless, indeed, some limiting 
condition be introduced, and this does not appear to have been done). 

Incidentally, the preceding considerations have brought us into 
contact with a grave equivocality with respect to the term “specific.” 





1 January, 1932. 
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By the present writer, this word has usually been employed in opposi- 
tion to “universal”; the specific factor in an ability has meant all the 
content of that ability other than the universal portion; see, for instance, 
our theorem (2). But many later writers have used the term in a 
more restricted sense; for them, it has become limited to any content 
of an ability that does not occur a second time in the same set of abilities; 
it thus may be described as ‘‘non-recurrent”’; clearly it cannot be the 
general factor, but may be any portion of the specific one. 

To illustrate these two meanings of “specific factor,’’ we may turn 
back to the abilities a and b of our Fig. 2. In the original meaning as 
non-universal, the term “specific” applies to the whole of both s, and 
s. But in the subsequently introduced meaning as non-recurrent, 
the term specific does not apply to the overlapping (doubly shaded) 
portions of these s’s, but only to the non-overlapping (singly shaded) 
portions. 

Yet a third meaning has been given to the term “specific’’; for it 
has been taken to indicate any factor which, over and above being 
non-recurrent in some particular set of abilities, cannot occur again 
in any other ability whatever. To get an unequivocal name for such 
a factor—whether or not anythin~ of the sort really exists—we might 
call it “unique.” 

On the whole, then, there are three candidates for the title of 
“specific factor”; above, these have been distinguished by the respec- 
tive names of ‘‘non-universal,” “non-recurrent,” and “unique.” 

Intimately connected with the preceding confusion, is a complaint 
which has played no small part in controversy, although at bottom it 
is not so much of scientific as of titular and personal nature. This is 
to the effect that since, as we have seen, an ability may possess factors 
in indefinitely large number (universal, general, non-recurrent, unique, 
and so forth), therefore the theory has no good right to entitle itself 
that of only “two” factors. 

But against this complaint must be said that all such large multipli- 
cation of factors in an ability does not arise from renouncing its 
primary and fundamental bisection into the original two factors, 
universal and the non-universal; it only comes from submitting the 
non-universal factor to further and secondary sub-division. Take 
as a typical illustration the specific factor s, in Fig. 2; the figure shows 
it sub-divided into the doubly shaded or recurrent component and the 
singly shaded or non-recurrent component. 
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Moreover, this secondary subdivision, unlike the primary bisection, 
is unstable. Thus in our preceding instance the motive for sub- 
dividing s, in this particular manner was the fact that the set of abili- 
ties at issue contained besides a also b. If we alter the composition 
of the set by eliminating b, the overlap or group factor as such disap- 
pears. Speaking generally, all these sub-divisions of an ability depend 
on what other abilities we choose to put into one and the same set; 
they therefore come and go at our will. Whereas the primary bisec- 
tion into universal and non-universal factors remains inviolate; it 
is not dependent on any chance composition of a particular set of 
abilities, but instead marks the most fundamental feature in ability 
as a whole. 


5. COMPARISON WITH PHYSICAL SCIENCE 


The preceding considerations about the theory of Two Factors 
lead us up to yet another charge that has been made against it. This 
consists in urging that, if the theorem (1) is to a large extent nullified 
by the reservations (2) and (2a), then it cannot rightly claim to 
constitute any general law; and, failing this virtue, it ceases to be 
scientific. 

But in reply, we may appeal to physical science and point out that 
even here the laws are usually subject to drastic reservations. For 
example, one of the most important of all physical laws is that of Ohm, 
whereby the number of amperes is declared to be directly proportional 
to that of ohms. Yet even this law is by no means valid under all 
conditions. ‘To quote from a recognized textbook: 


It is true only for constant currents. It does not apply where electrical energy 
is being transformed into other than heat energy. Circuits in which the law does 
not apply are those in which matter is being decomposed by the current, motors 
are being run, or magnetic fields are being set or destroyed. (First course in 
Physics, Millikan, Gale, and Edwards.) 


Indeed, some of the most important physical laws are not realized 
perfectly under any conditions, for instance, no planet actually travels 
in a perfect ellipse. To regard it as doing so is only to create a ficti- 
tiously simple case for convenience of thought. Other physical laws 
are not under any condition realized even approximately. For 
instance the actual course of an object set in motion does not in general 
so};much as approximate to any uniform speed in a straight line. 
Nevertheless the uniform speed and the straightness of line have 
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faithfully served for centuries as the theoretical basis of physical 
theory. 

On the whole, one cannot but feel that the said charge against the 
theory of Two Factors of not being scientific is inclined to be captious 
and might with scientific advantage be replaced by less facile and more 
constructive work. 


SUMMARY 


In the preceding paper, an examination has been made of the 
complaint sometimes brought against the theory of Two Factors, that 
it fails to be corroborated by actual observation. As chief source of 
the complaint, we have found that the theory has been fundamentally 
misunderstood. The complainants have committed the cardinal error 
of taking it to deal only with the simple case where the correlations 
show zero tetrads (or, in older terms, “hierarchy’’). Whereas in 
truth a function of the theory just as aboriginal, and even more 
essential, is to deal with the complex case where the tetrads are not 
zero. 
Incidentally, we have had occasion to clear up several ambiguous 
terms—such as “specific factor” —which are at present causing much 
confusion, fallacy, and conflict. 
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AN EXPERIMENT IN INDIVIDUAL TRAINING OF 
PITCH-DEFICIENT CHILDREN 


MANUEL WOLNER AND W. H. PYLE 
Graduate School, Colleges of the City of Detroit 


Certain psychologists, notably Dr. C. E. Seashore,* have held 
that pitch discrimination is little affected either by age or practice, 
and that consequently children deficient in ability to discriminate 
pitch remain deficient in spite of training. Many teachers of music 
have expressed skepticism of this conclusion and have claimed to have 
improved children in ability to distinguish pitch differences. The 
writers, therefore, decided to make a thorough and intensive test of 
the matter. 

As the first step, we asked the music teachers in three Detroit 
elementary schools to select pupils showing the greatest deficiency in 
pitch discrimination. These pupils constituted the subjects of our 
experiments. From this group we further selected the seven poorest. 
Our criteria of selection was the ability to distinguish pitch differences 
on the piano and to distinguish pitch differences between pairs of 
Whipple forks. The widest difference on the forks is a difference of 


jthirty vibrations. No pupil selected for experimentation could dis- 


tinguish the thirty vibration difference on the forks. On the piano, 
in general they could not distinguish differences of the octave, fifth, 
third, whole-tone, or half tone. None of them could sing, although 
they had been in the music classes from the first grade, and were at 


‘the time of the experiment in the fifth, sixth, and seventh grades. 


/The pupils chosen consisted of three boys and four girls. Some were 
of superior, some of medium, and some of poor intelligence. Our 
subjects, then were seven pupils who had been under musical instruc- 
tion in the Detroit schools for five, six, or seven years and had not 


learned to distinguish one pitch from another, and of course could not 
sing. 


EXPERIMENTAL 


Our problem was to see if these pupils could be taught to distinguish 
pitch and to sing. Training was begun on February 8, 1932, and con- 





* A bibliography will be found at the end of this article, and a full and detailed 
account of the experiment in the form of a graduate thesis by Manuel Wolner is in 
our college library. 
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tinued until April 28. Each pupil received individual instruction and ~ 
tests for twenty minutes each morning, five days a week. The whole 
number of hours spent in training was on the average sixteen for each 
pupil, and extended over a period of eighty-one days. 

The definition and meaning of pitch or high and low as distin- v 
guished from intensity, duration, and timbre were strongly and repeat- 
edly emphasized, particularly in the early days of training. The 
pupils were led to see the necessity of thinking of tones as one would 
think of a problem. Various forms of imagery were recalled to induce” 
and suggest the analogy. Interest, attention, and concentration were 
aroused to high levels by enlightening the pupil as to the amount of 
good to be obtained not only in music, but in general, through attain- 
ment of a high degree of acuity. 


TRAINING WITH THE PIANO 


Training at the piano occupied the greater part of the daily train- 
ing period during the early stages of the experiment. We felt that the 
use of the Whipple forks would be of little value, until the pupils 
learned to distinguish the larger intervals of the piano. Before the 
pupils could distinguish the smaller intervals, the larger ones had to 
be mastered. 

To acquaint the pupils practically with the meaning of pitch and 
to get some foundation on which to work, the pupils were trained to 
reproduce vocally middle C as heard on the piano. This training 
required almost infinite patience and direction. After the reproduc- 
tion of C was mastered with a fair degree of certainty, the pupi: was 
trained to reproduce D. Then drill would follow on the sequence 
C-D-C-D, and on up and down the major scale, depending on the 
amount of improvement in each case. The pupil was repeatedly cau- 
tioned to listen first, to get a mental picture, then to reproduce as 
accurately as possible. To guard against fatigue and loss of attention, 
a shift to the chromatic scale would be made. This relieved monotony 
somewhat and aroused new interest. 

The significance of this form of training consists in the fact that it 
opens up a new form of experience for the pupil. It helps him to sing 
in tune, and broadens his conception of high and low pitch to the 
extent that he feels the muscles of his vocal organs tightening or 
relaxing as the case may be. He thus demonstrates to himself prac- ~ 
tically the meaning of pitch and pitch differences. As improvement 
was noted, the scale was extended above and below the octave. After 
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L further improvement, the pupil was drilled in singing the intervals 
bi C-E, E-G, C-G, and C-C. In singing these exercises, the notes were 
sung “la,” thereby minimizing inhibition and providing facilitation. 
vAlong with this voice reproduction, drills and practice in the discrimi- 
nation of these intervals, whole tones and half tones, were constantly 
given. When an answer was wrong, the pupil was obliged to sing the 





two tones and determine for himself the correctness of his judgment. octs 
As the experiment proceeded, changes and innovations were intro- sixt 
duced to provide new incentive and to continue attention at a high C, ' 
“level. The words “high” and “low” were substituted for “la.” 
Then the pupil would sing in one register, while the experimenter = 
would continue to play the same notes and intervals (octaves, fifths, = 
thirds, whole tones, and semi-tones) in all the registers through three vid 
octaves above and one and one-half below middle C, frequently calling - 
the pupil’s attention to the purpose sought. This particular exercise ber 
encompassed a wide range of tones, and w..a utilized to increase musical = 
understanding, which is a great aid to the comprehension of pitch dis 
discrimination. ire 
Further progress warranted the singing of major and minor scales, 06 
diatonic and chromatic, their principal intervals, and skipping at = 
/random from one note to another. For pupils making the most 
progress, songs were introduced. . 
TRAINING WITH THE FORKS or 
The Whipple forks used in this experiment consisted of a standard - 
A fork (four hundred thirty-five vibrations per second) and ten other fi 
forks higher than the standard by 30, 23, 17, 12, 8, 5, 3, 2, 1, and .5 e 
vibrations per second. The forks were used in the manner stand- “ 
ardized by Whipple.* The use of the forks supplemented that of the V 
piano. In training with the forks having 30, 23, and 17 vibration t 
“ differences, the pupils were required to reproduce the tones vocally d 


as they had done in using the piano. The same training and innova- 

tions were used in working with the forks. A form of drill much used : 

was as follows: The experimenter would strike the low fork and the . 

pupil would sing “low”; again the experimenter would strike the 
] 





Ws second fork and the pupil would sing ‘“‘high,” and then utter the word 
| iP “higher,” or conversely. An added variation was to have the experi- | 
iis + menter strike a fork, the pupil to sing it ‘‘la,”’ then the experimenter to 
i 7 strike the second fork, the pupil to sing it “‘la,’”’ then sing “‘low-high,”’ 


* Whipple, G. M.: Manual of Mental and Physical Tests. 1910, p. 180. 
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and then utter the word “higher” or vice versa. This exercise was used 
to build up independence and establish the self-reliance of the pupil. 


RESULTS AND CONCLUSIONS 


All seven pupils learned to discriminate perfectly the intervals of ,/ 


octaves, fifths, thirds, whole tone and semi-tones in the range from A, 
sixteen tones below middle C through A thirty-four tones above middle 
C, a tonal range of forty-nine semi-tones or four octaves. _. 

The progress was unequal from every point of view. There was 
great variability in the time required to reach a given degree of effi- 
ciency, and in the response to methods and changes of method. Indi- 
vidual differences presented themselves continually. The need for 
recognition of these differences and the proper remedial application 
were turning points for the success of the experiment. But of chief 
importance was the patience and perseverance required in the face of 
discouraging results, especially with the forks, in the early weeks of 
training. Previous experiments to train pitch-deficient children have 
probably failed because the experimenter did not persist beyond the 
early stages of training. 

With the forks, four of the pupils became perfect in distinguishing 


all the pitch differences from the largest, thirty vibrations, down to the” 


smallest difference, one-half a vibration. Of the other three pupils, 
one learned to distinguish perfectly down to two vibrations difference; 
another down to three vibrations difference; and the third, down to 
five vibrations difference. Our standard of perfection in the fork 
experiments was correct judgments in ten out of ten trials, and not in 
eight out of ten as has been the practice in using this experiment. 
With the standard of eight correct judgments out of ten trials, prac- 
tically all of our subjects became able to distinguish all the fork 
differences. 


Each pupil improved noticeably in ability to sing. At the conclu+” 


sion of the experiment, one pupil sang the words and music of several 
songs with no trace of pitch deficiency; and also sang major and minor 
scales, chromatics, intervals, and tones picked at random. Another 
sang scales and intervals and the music of a song without words. Two 
pupils sang scales and intervals. The other three pupils sang, not 
perfectly, but with tremendous improvement over their initial efforts. 
Where formerly these pupils had no conception of pitch or range, they 
now have open to themselves the possibilities of music. , 
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The results of our experiment amount to this: We have taken seven 
of the worst pitch-deficient pupils found in three Detroit elementary 
schools, and by three months of training in which we used the piano, 
tuning forks, and vocal exercises, have trained them to distinguish 
! pitch with considerable accuracy and to sing. The indications were 
that if training had continued, those pupils poorest in pitch discrimina- 
tion would have reached as great an accuracy in distinguishing pitch 
as is common in the general population. That all pitch-deficient 
children can be trained to distinguish pitch, our experiment does not, 
of course, prove. But considering the manner in which our subjects 
were selected, and the fact that improvement was great in every case, 
‘the conclusion seems probable that most pitch-deficient children can 
be trained to distinguish pitch with considerable accuracy. If this 
conclusion is correct, such children can not be lacking in pitch dis- 
tinguishing mechanism of the inner ear. It seems probable that the 
failure of such children to learn to distinguish pitch in the ordinary 
school instruction is due to some failure in method, just as some chil- 
dren fail to learn to read under the ordinary group instruction but suc- 
ceed under the proper remedial instruction. Of course, some children 
fail to learn to read because of grave anatomical or physiological 
_} derangements in the brain. Likewise, some children may not learn 
to distinguish pitch because of serious neural derangement, or anatomi- 
cal deficiencies in the inner ear. But there was no such case among 
those with whom we experimented. The results of our experiment, 
therefore, force us to conclude that the opinion rather generally held, 
that inability to distinguish pitch is due to some native structural 
defects in the hearing mechanism and cannot be affected by training 
or practice, is not correct. Indeed it may turn out to be true that even 
children with deficient ear mechanisms can be trained to distinguish 
pitch. 
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THE INFLUENCE UPON ACHIEVEMENT OF A 
KNOWLEDGE OF PROGRESS 
\ 


C. C. ROSS 


University of Kentucky 


It has been an epoch-making half-century now since psychology 
first definitely broke away from its family roof, philosophy, and set 
up in business for itself as an experimental science. From the begin- 
ning, learning behavior, the study of man’s attempts to adapt himself 
to the world about him, occupied prominent attention. 

However, one important aspect of the learning process has con- 
tinued to remain largely a baffling mystery, serving alike to testify 
to the bewildering complexity of the human organism and the discour- 
aging impotence of man’s mind to fully comprehend itself. This 
aspect of learning is motivation. What is the nature of the drive, 
urge, impulse, desire—call it what you will—that lies back of the learn- 
ing act, that makes the individual want to do it, that pushes him out, 
so to speak, to meet his environment? 

Upon this problem psychology has made its least convincing experi- 
mental approach, and is most inclined to listen with open eyes to the 
ancient folklore of its parent, philosophy. It is true that some evi- 
dence of a growing discontent with this type of bedtime story has 
appeared and a few scattering experiments have been made that bear 
upon the problem. One of these has been the study of the knowledge 
of progress as a motivating factor in the learning process. 

One of the most comprehensive and best known of these studies 
was that of Book and Norvelle! a decade ago. The results for one 
hundred twenty-four juniors and seniors at Indiana University in 
making legible ‘‘a’s” are presented graphically in Fig. 1. It will be 
noted that the group with knowledge of progress excelled the other in 
each of the first ten practice periods, but lost this advantage abruptly 
when the knowledge of progress was withheld. This seemed to estab- 
lish clearly, in this situation at least, the motivating effect in learning 
of a knowledge of progress. 

For the past several years the writer has been interested in this 
same approach to the problem of motivation. His earlier experi- 
ments involved such dissimilar functions as adding the same number 


1 Book, W. F. and L. Norvelle: An Experimental Study of Learning Incentives. 


Pedagogical Seminary, Vol. XXIX, December, 1922, pp. 305-362. 
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successively to a given number, or subtracting the same number 
successively from a given number, for a specified length of time; judg- 
ing the time interval between two sounds; and simple motor functions 
such as gripping the dynamometer, and making ‘‘tally marks.”” The 
technique employed in all of these studies was similar to that used by 
Book, with the exception that three groups were used instead of two, 
and that the experimental and control groups were equated upon the 
basis of an initial practice period in the function to be studied. One 
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Fig 1- Boon's ExPERIMENT IN MAKING LEGIBLE O'S, COMPARING GROUPS 
WITH FULL KNOWLEDGE AND No KNOWLEDGE OF PROGRESS. 
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group practiced with a full knowledge of progress after the first trial. 
In this group, before making the next trial, each student was shown his 
results in the preceding trial, together with a frequency distribution of 
the scores of the group. Another equivalent group was given vague 
information as to progress. In this group each student was told 
merely that he was above or below the average of his group. A third 
group practiced without being given any knowledge whatsoever as to 
progress. At the end of ten practice periods conditions were reversed 
for from two to five trials. 

As the conditions for all these studies were essentially the same and 
the results similar, only one will be summarized here, and that briefly.' 





1 For a fuller account of this experiment, see: Ross, C. C.: An Experiment in 
Motivation. The Journal of Educational Psychology, Vol. XVIII, May, 1927, 
pp. 337-346. 
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This experiment involved the learning to perform a simple motor act, 
namely, making a group of four vertical lines and crossing them with a 
fifth, as in the familiar device known as “tallying.” The subjects 
were fifty-nine college students, of whom twenty-eight were freshmen. 
The results are presented in Table I, and graphically in Fig. 2. It will 
be noted that after the third trial, the group with full knowledge of 
progress forged steadily ahead of the other two groups, while the group 
with no knowledge made the poorest showing of all. 


Taste I.—Mean Scorgs ror Twetve Triats in Maxine +P hK’s, Comparine 
Groups wits Fut, Vacuz, anp No KNowLepGse or PRoGREsS 

















Mean score on each trial 
Group status Group status 
ita 3-80 1}/2/31/4/s/6l7/s8]/o9 | s| re 
Full knowledge. . .|35.6)44.8/50. 5/51. 4/53. 4/56. 2|54.9\57.2|57.7|57. .7|\61.6| No knowledge 
Vague knowledge |35.7/45.2/47.8/50.3/50. 1|52.0/51.6/52.6/53.8/54. .0/58.1| Full knowledge 
No knowledge... .|35.0)44.7|46.6/49.0/51.1/52.0/51.7/50.8/51.8/53. .7|\56.5) Full knowledge 



































The conclusion, based with some confidence upon the results from 
this and other similar experiments, was as follows: 

A knowledge of the learner’s own individual progress, both relative and abso- 
lute, and that of the group of which he is a member, is sufficient to give him a 
distinct superiority over competitors, and the degree of superiority is roughly 
proportional to the amount of knowledge possessed. 

While these findings are in substantial agreement with Book’s 
during the first ten trials, the results reported here after conditions 
were reversed are distinctly different. As will be seen in Fig. 1, Book 
found a sharp decline in the performance after the motive was dropped. 
The present writer never found any evidence of this letting down in 
any of his curves. On the contrary, he was convinced that the moti- 
vating effect of a knowledge of progress, when allowed to operate 
undisturbed, was relatively continuous, and measurably self-sustain- 
ing, persisting undiminished through the remaining practice periods 
in spite of the fact that no further knowledge of progress was given. 
Indeed, the writer strongly suspected that Book’s admonition to his 
subjects, after the knowledge of progress was withdrawn, that they 
were to ‘‘banish all thought and desire for improvement as such from 
their minds,” may have served as a positive suggestion to slacken their 
efforts that was far more potent than the negative effects of withhold- 
ing further knowledge of progress. 
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On the whole, the results of these experiments appeared so con- 
vincing that practically all textbooks in educational psychology that 
have appeared in the last five years have quoted them with approval. 
In the meantime, however, it occurred to the writer that after all 
these experiments were performed under laboratory conditions and 
that the results ought to be “confirmed” under actual classroom 
conditions. 

At the beginning of the Spring Semester, 1931, at the University 
of Kentucky the writer found himself confronted with a class of eighty- 
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three students in ‘‘Tests and Measurements,” whose basic text was 
Ruch’s, “‘The Objective or New-type Examination.” These students 
were mainly juniors and seniors, although there were a few graduate 


students. This appeared to present the opportunity desired. 


\Therefore, upon the basis of a comprehensive objective test, whose 
reliability was .90, based upon Part I of the text, four substantially 
equivalent groups of twenty students each were formed in this class, 
each student in the control group being “‘ paired” with a similar student 
in each of the experimental groups.! During the remainder of the 
semester objective tests whose median reliability was .66 were given 
once a week to all the students. However, since comparisons between 





1 Since two students dropped out of school during the semester, it was necessary 
to disregard the record of their “‘pairs” also. This left seventy-two students whose 
records could be used in the study. 
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groups are based upon cumulative scores, the reliability of the measure- 
ment at any point is not simply that of the single class test but reflects 
that of all tests to date. 

The distribution of scores of the entire class was placed on the 
board and the items on the test missed by any considerable number of 
students were discussed after each test. However, one group was 
given no knowledge whatsoever as to its progress, either individually 
or as a group. A second group was given vague information, each 
student being told simply that his score was “good,” “‘fair,’’ or “ poor.” 
A third group was given partial information as to progress, each 
student being told his point score on each test, but not shown his test 
paper. The fourth group, however, was given full information, being 
retained at the close of the class hour so that the papers could be 
distributed to them and opportunity given for discovering and dis- 
cussing individual errors. The papers were then collected—the whole 
process usually taking five or ten minutes. This continued for seven 
tests, at the end of which time conditions were reversed. During the 
last four tests the group with full knowledge and the one with no 
knowledge changed places, and likewise the groups with vague and 
partial knowledge were reversed. 

The results of this class experiment are given in Table II, and 
Fig. 3. The mean point scores are shown cumulatively, test by test. 
An examination of the data reveals the surprising fact that nowhere is 
there a statistically significant difference among the four groups. In 
fact, the largest differences are not between the group with full knowl- 
edge and the group with no knowledge, as was to be expected; but, as 
a rule, between the group with partial knowledge and the group with 
full knowledge. However, the maximum difference in terms of per- 
centage is never more than six per cent, and in terms of score, only 
about eight points with a probable error fully as large as the difference 
itself. In other words, the motivating effect of a knowledge] of 
progress, which appeared so evident in the earlier laboratory experi- 
ments, has disappeared altogether in the actual classroom situation. 

When the writer had sufficiently recovered from the shock of this 
surprising discovery, he began to search for the answer. It has 


occurred to him that two factors afford at least a partial explanation. - 


In the first place, although one group is referred to as having ‘‘no 
knowledge of progress,’ strictly speaking such is not the case. In 
all experiments of this type it is manifestly impossible to eliminate 
the subjective impression of the student regarding his progress. While 
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other experimenters have overlooked or disregarded this factor, it has 
obviously been present. In reality, then, such experiments really 
involve a comparison of objective knowledge in the form of test scores 
and subjective knowledge in the form of personal impressions. To 
test the accuracy of this subjective knowledge in a subsequent experi- 
ment involving two classes in the same subject, all students who did 
not know their test scores were asked at the close of each test to esti- 
mate the scores they thought they had made. The median correlation 
of these estimates with the actual scores was .71. This indicates that 
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the reliability of the students’ estimates of themselves averaged some- 
what higher than did teachers’ marks on essay examinations. More- 
over, there was an interesting regression effect, the poor students rather 
consistently overestimating their scores while the good students tended 
to underestimate theirs. In the case of students who overestimated 
their scores it would appear to be a case of ‘where ignorance is bliss, 
’tis folly to be wise,’”’ for their attitude might very well have been 
better under the illusion of success than under the reality of failure. 
But, there is a second factor which is much more fundamental, 
namely, that the inherent differences between the laboratory situation 
and the classroom situation are so great as to make the laboratory 
experiment alone an insufficient guide to educational practice. The 
attempt at rigid control is certain to make the laboratory a somewhat 
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artificial situation, largely divorced from human interest except for 
the specialist. On the other hand, the life situation illustrated by the 
ordinary class test in school is real; each day’s work brings new topics 
and new material, and with the inevitable necessity of making a stand- 
ing in order to make the team, get into a fraternity, or stay in school. 
May it not be, therefore, that a single additional factor such as knowl- 
edge of progress is an event of importance in the monotonous labora- 
tory situation, but only a trivial incident in the already highly 
motivated life situation in school? 

Furthermore, uncertainty as to results may operate differently 
for the many individuals in the two situations. On the one hand, 
practice without knowledge of results in the artificial laboratory 
exercises may easily lead to a slackening of effort and genuine indiffer- 
ence to the outcome of a task regarded as unimportant anyway. On 
the other hand, in the life situation the agony of suspense as to the out- 
come of an event regarded as personally important may be just the 
opposite. Take, for example, the bustling activity and anxious solici- 
tude of the lover, the outcome of whose intentions is still in doubt, in 
contrast with the chronic indifference of the average husband after the 
honeymoon is safely passed. Or, contrast the behavior of the husband 
nervously pacing the corridors of the hospital with the calm resignation 
with which he greets the nurse’s announcement: “‘ Another girl, mother 
and daughter resting nicely.” 

_In other words, knowledge and suspense in the laboratory are not 
the same as knowledge and suspense in real life. The practical impli- 
cation of this is that it is hazardous to generalize from one to the other, 
as we have so often done in the past. On this point, the writer finds 
himself in complete agreement with a statement by Buckingham: 


As long as learning experiments are handled by psychologists alone, we shall 
make slow progress as far as education is concerned . . . The work of the psy- 
chologist is of high value, but it is valuable so far as education is concerned, largely 
because it opens up new fields and sets new problems. ‘These fields will not be 
thoroughly explored, nor these problems adequately solved until the psychologist 
in the laboratory is supported by the teacher in the classroom.! 


The writer has repeated the above experiment with two other 
classes in the same subject, totaling eighty-eight students. The results 
for these two classes are presented in Table III. A comparison with 





1 Buckingham, R. R.: Research For Teachers. New York; Silver, Burdett and 
Company, 1926, pp. 369-370. 
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Table II will reveal that although the scores run somewhat higher 
in the second experiment, due probably to a change in teaching empha- 
sis, there is marked agreement between the two experiments. Here 
again the differences among the four groups, while in some instances 
somewhat greater than in the first experiment are in no case statis- 
tically significant. 

Not content with these results the writer conducted a similar 
experiment in another college subject,! and persuaded a colleague to 
do the same in a third subject.?_ In these experiments, involving more 
than fifty tests and two hundred ninety-six students in five different 
college classes, not once has there appeared a difference favoring the 
group with full knowledge of progress that meets the minimum require- 
ment of statistical significance, namely, that it be three times its prob- 
able error. Moreover, when the results of the above experiments are 
analyzed further it appears that the knowledge of progress is equally 
impotent in influencing the achievement of students making high scores 
and those making low scores. 

- In view of the above facts the writer was forced to the conclusion 

t a knowledge of progress, such as is afforded by students seeing 

heir test papers or hearing their test scores, need not be a factor of 
‘practical significance in the ordinary classroom situation in college. 

Recently Brown® has reported an experiment conducted in a large 
urban public school, involving one hundred thirty-eight pupils in 7A 
and 5A arithmetic, that appears to present some evidence not in 
harmony with the above results and interpretations. However, in 
this study the experimental and control groups in neither grade were 
equated on the basis of attainment in arithmetic.‘ Furthermore, 
although the mean differences rarely exceeded one arithmetic problem, 
the results were irregular and somewhat inconsistent from day to day, 
being generally less on the tenth day than the first. All things con- 
sidered, most readers would doubtless agree that the author’s warning 





1 A class of sixty students in ‘‘ Educational Measurement in the High School,” 
using Odell’s book as a text. 

2 A class of seventy-six students in “‘School Organization” taught by Dr. Leo M. 
Chamberlain, professor of school administration. 

’ Brown, F. J.: Knowledge of Results as an Incentive in Schoolroom Practice. 
Journal of Educational Psychology, Vol. XXIII, October, 1932, pp. 532-552. 

4In 7A, the groups were selected on the basis of teacher rating and scores on 
the Terman Group Test of Mental Ability (in reality the difference was about 
fifteen percentile points between the groups), and in 5A the selection was upon a 
combination of the teachers’ and principals’ estimates. 
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; is very much in order: “In interpreting such data as are presented 
P in this experiment, it is necessary to be exceedingly cautious in the 
drawing of conclusions.” 
; It is clear that further experimental work in actual classroom situa- 


tions at various educational levels and with pupils of varying degrees 
of capacity is greatly needed on this problem. In the meantime 
students of education will do well to bear in mind the wise admonition 
of Josh Billings,’ namely, ‘‘It is better to kno less, than to kno so mutch 
that ain’t so.” 


1 Billings, Josh: Old Probability. New York, 1879. (Dedication page.) 
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A NEW PROOF AND CORRECTED FORMULAE FOR 
THE STANDARD ERROR OF A MEAN AND OF A 
STANDARD DEVIATION: 











CHARLES C. PETERS AND W. R. VAN VOORHIS B 

in 

Pennsylvania State College th 

In this paper we make a different approach to the standard error 2 

of a mean and of a standard deviation from that customarily made, v 

and generalize the formulae in such manner as to make room for the d 

very important correlation factor. | 

Let us conceive of our measures as derivations from the mean of all 7 

the means of a great many random samples, say S samples. Then . 
for the value of the mean of any one sample, M;, we have 

M _~ tt tists: ttn 

. n e 

< 

where the x’s are the individual measures and n is the number of them ; 

in the sample. Squaring for this mean, P 

Pin Cee SS C 

n I 

w* + 2° + 23" °° * 4+-22it0 + 2ei1%3 * * { 

n? f 


We may write this, 
n?*M,? = 2? + 2222,2; 


where we mean by the double summation to indicate that a term, as | 
21, enters into combination with each of the others to form a series and 

also that each of the terms enters with the rest to make such a series, 

all of them summed together constituting the tail of the expression. 

We shall have similar expressions for M2, M3, etc., though with what 

we must take to be different z’s. Summing now for all the samples 

and dividing by their number, S, we have as our formula for n? times 

the o? of the mean 








mPZM? _ 2(2a* + 2UTaxw;) 
8 iS 





1This is part of a chapter on reliability from a book on Statistics by these 
authors to be published shortly. 
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In the conventional proof, followed by Kelley, Jones and others, 

it is claimed that the tail of this expression amounts substantially to 
zero, the following theorem being cited as proof:! 
But the proof is invalid because the theorem upon which it rests is 
inapplicable. For, on the one hand, the means of the measures within 
the several sets in which the products are obtained are not zero but 
M;; and, on the other hand, the products are not inclusive of all, since 
those of the type x7; are definitely withheld and included in the 2z*’s. 
We shall resort to a more round-about, but mathematically defensible, 
development to prove that the tail approaches zero as a value. But 
meanwhile, noticing that 2M?/S is c,,? and clearing of fractions, we 
shall write our formula more simply thus: 


n*Som? = 2( Lx? + 2EIz,2)). (A) 


We have said that S should represent many samples. In order to 
exhaust the situation and thus perfect our development we shall make 
S all the possible different samples that can be drawn from a total 
population of N taken n at a time. These samples must always be 
different, but the slightest possible difference will do—merely the 
change of a single z in the whole set of nz’s. Reference to the treat- 
ment of the mathematics of choice in a textbook in Algebra will show 
that the number of combinations of N things taken n at a time (con- 
sequently the numerical value of S) is given by the formula: 

N(N — 1)(N — 2)(N — 3) -- \(N-—n+1) (B) 
n(n — 1)(n — 2)(n — 3) +--+: 1 ; 


In the set of S samples there will be, as implied above, duplication 
of variates. How many duplications? Consider first the part of the 
expression containing 2z?. All the x”’s will, of course, appear in the 
summation as a whole, but not every one will appear in each sample. 
It will, however, appear as often as combinations can be made of the 
other z?’s taken so as to leave room for it; that is, the number of times 
it will occur is the total possible number of combinations that can be 
made of N — 1 things taken n — 1 ata time. In all of our future 
manipulations in this paper we can use formula (B) as the basic 
formula. To learn how many combinations can be-made of N — 1 
things taken n — 1 at a time we need only substitute N — 1 for N 








1 Kelley: “‘Statistical Method.” Pp. 84. 
The sum of products of measures which are independent of each other and 
whose means are zero, equals zero. 
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and n — 1 for n. Doing this we shall have, as the frequency with 
which each of the z”’s will occur, 


(N — 1)(N — 2)(N — 3) - at. 
(n — 1)(n — 2)(n — 3) 





Since each of the z”’s will occur this same number of times, we may 
use it as a coefficient for the summation of all the zs. Thus the first 
quantity in the right hand member of our equation will have the value, 





(N — 1)(N — 2)(N — 8) - Hk Seat 
(n — 1)(n — 2)(n — 3) >" 


where zy’ is the sum of all the different z?’s in the whole N population. 

The double summation, which is the second part of our expression 
in (A), will, of course, recur as a whole the same number of times as 
we take samples, namely S. Within each sample there will be no 
duplication of paired-terms, but successive samples will partly overlap 
and partly differ. So there will be duplications of a given paired- 
element, just as in the case of the z’s, -but perhaps a different number. 
Let us see how many. 

Within each sample the number of different paired-elements is the 
number of possible combinations of n things taken two at atime. If, 


in our basic formula, (B), you will substitute n for N and 2 for n, you 
will find this number to be 


n(n — 1) 
a 


The number of samples has been already giveninformula(B). There- 
fore the total number of paired-items in the whole of the tail for all 
the samples combined is the product of the number of samples and the 
number of items in each sample, namely: 


N(N — 1)(N — 2)(N — 3) --:: (N — n+1)n(n — 1) 
2n(n — 1)(n — 2)--- 1 





But the whole number of different paired-items is the number that 
can be made from the whole population of N taken two at a time. 
This, as appropriate substitution in formula (B) will show, is 


N(N — 1) 
2 
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Since all the possible variates occur and with equal frequencies, the 
number of times each will occur is the total number divided by the 
number of different ones. That is 


2N(N — 1)(N — 2)(-—3) -- - (N—n+ natn = 5p 
2N(N — 1)n(n — Da — 2)(n — 3): 


Certain of these terms cancel out, leaving as the at: of occur- 
rence of each possible different pair, and hence as the coefficient of the 
summation constituting the second part of our expression in (A), 


(N — 2)(N — 3) --- (N—n+1) 
(n — 2)(n — 3) - 


Substituting in (B) the coefficients we found for each of our two 
summations, 


teat = N= DW =2) Wnt) Syn 
vn (n — T geg x + 


(N — 2)(N — 3): W-n4ty 
ee poe 2> D>22" (©) 


Abandoning that line of development for the moment, we may 
write: 
(ti +¢e+%3°°* + 2y)(t¥i t+%2+273°°* +2y) = 90, 
for each quantity in parenthesis sums all of our N items and they aggre- 
gate zero because the measures were taken as deviations from the 
mean of the whole set. Multiplying out, 


2? + ro? +23? + + + + Qaire +201%3+ QZrityg + - + + Qrots > > + = Oz 
This we may write more briefly as 


Ltn" + 2222 :Xjn = (). 














Therefore, transposing, 
Ql lrwin = — Vay’. 


Both sides of this expression contain all the variates but no duplicates, 
which makes thein parallel in meaning with those of formula (C) 
Substituting this value in (C) we shall have: 


2,2 = (N — 1)(N — 2) - a ‘ae 
one 3m (n — i >" 








da Ata Nt Sent 


(n — 2)(n — 3) 
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Let us examine this expression closely. If we multiply the first 
of the members on the right by N/n we shall have the equivalent of S, 
for we shall have the formula for the number of combinations that 
can be formed of N things taken n at a time. And similarly we 


shall have S as the coefficient of the second term if we multiply by 
ae — i . Wecan do such multiplying if we indicate a compensating 


division or (which amounts to the same thing) a multiplication by the 
reciprocals of these terms. Making these adjustments we have: 


Sron? = >" — wwe 


2 
Dividing through by nS and taking Sr out of the parenthesis we have 


Ae Dry" oe n-—l 
NO m? = NV (1 rat) 


But N is the number of items out of which =zy? is constituted, since 
this has been so carried as not to include the duplicates. Therefore 
2ty"?/N is the o* of the whole N population. Making this substitu- 
tion and again dividing by n, 











_ o2(, _” —1) 
—e n\i N - ‘) (D) 

Now let N increase indefinitely. That is to say, let the size of 
the population from which samples are drawn become unlimited so that 
the samples drawn from it are purely random ones which do not over- 
lap. Then the value of the fraction in the parenthesis will approach 
zero and, in the limit, 





gei = 22; 
n 
and 
C 
a = _ E 
: 7 (E) 


Here the conventional development stops. But it should proceed 
a little further. The oc, is the standard deviation of the whole popu- 
lation whereas the only ¢ we can know, and hence the one we wish to 
employ in our formula, is that of the sample we have in hand. This 
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will be somewhat smaller than that of the whole population. Let us 
see. 
We have taken our z measures from the mean of all the means. 
That is outside of the mean of our sample by just the amount of M,, 
and similarly it is outside of the other means. It is a well-known fact 
that when a oc is computed from an assumed mean which differs from 
the true one, a correction must be made as follows: 





n 
2 2 
7 c?; rz? = L2r,? — nc?; rrq* = zr + ne’. 


n n 


Here z; is a deviation from the true mean of the sample, while z, is a 
deviation from an outside mean-in this case the mean of all the means. 
For uscis M. Therefore for any sample 


22r,* = 22,2 + nM? 
and for the S samples | 
Zr." = Trz,? + n=M?. 


Dividing by the total number of cases, Sn, and assuming that the 
= rz,? may be regarded as S=z,’, 

Zz? 2a? , SM? 

ee ae Pee 








Therefore 


x," = Fz, + Om’. 


(EZ) may be put into the form ne,” = ¢,?. Substituting for o,? its 
value from the expression immediately above, we have our true 
formula for the standard error of a mean in terms of the standard 
deviation of the sample in hand: 


NOm? = 62" + Om? 
On'(n — 1) = 0,” 








o:” 
2= 
ta n—1l 
pre... “Tas 
te /n—1 


This (n — 1) always belongs theoretically to a standard error of a 
mean. However, in educational statistics we customarily neglect it 
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because our n’s are so large that the subtraction of a 1 does not make 
an appreciable difference. Fisher' always carries the (n — 1), because 
he is writing chiefly for research workers in the biological sciences who 
seem to work often with small numbers of cases. While it is not worth 
the student’s trouble to make the correction in most statistical practice, 
he should remember that it always theoretically belongs in his formula 
and should employ it whenever, in sufficiently trustworthy measures, 
his n becomes small enough that the correction would make an appre- 
ciable difference. 

Effect of coe ene Selection.— We wish now to direct the attention 


of the reader to the ———— pen =- i of formula (D). For the formula to hold 


in the simple way in ad we left it at the close of our last paragraph 
above, the N must be very large. That is not always the case. Sup- 
pose, for example, you were taking samples of twenty-five pupils each 
from a total set of fifty pupils. Here n would be half as large as N. 
Disregarding the 1 subtracted from each on the ground that it makes 
little difference with numbers of reasonable size, and also paying no 
attention to the 1 of the (n — 1) under the gz, 


-2 = on 
on = 1 — .50. 

It is obvious that mi limitation of a: sampling here has a marked 
influence in decreasing the size of the standard error of the mean. In 
general, if we let p represent the percentage that the sample is of the 
whole population from which the samples are drawn, 





~* 1 — p. (F) 


Ordinarily the research worker will not have occasion to use this for- 
mula as here presented, but it is interesting and important as general- 
izing a principle we shall treat in our next paragraph. 

Standard Error of a Mean in Correlated Series.—In our previous 
section we saw that restriction of the population from which samples 
are drawn operates to reduce the standard error of the mean, making 
it ~/1 — p times as great as it would be if the samples came from an 
unrestricted population. That is because the successive samples 
overlap one another to the extent to which they are crowded into a 





1 Fisher, R. A.: “Statistical Methods for Research Workers.” Oliver and 
Boyd, Edinburgh and London, third edition, 1930. 
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small total population. We have a special case of such restriction 
when the successive samples are matched with an initial one in a 
relation that involves correlation. It has been shown! that correla- 
tion depends upon overlapping of the correlated samples, but not 
necessarily because of narrow boundaries of the total population from 
which samples are drawn. Restriction due to correlation would hap- 
pen, for example, when a class was retested with the same test or with 
a different form of the same test. It would happen equally certainly 
if a number of groups matched with an initial group for ability (say, on 
intelligence scores) were tested with the same test. In both cases the 
successive samples would fluctuate less, and hence have a smaller 
standard error, than if they had not been matched with an array with 
which they were correlated. We shall undertake to develop a formula 
for the standard error of a mean under this condition of correlation.? 

Suppose we have a series of z-scores and another series of y-scores, 
the two sets being correlated. The z-scores corresponding to y-scores 
of a given size would scatter so that their standard deviation would be 
o2\/1 — r*,, (Standard error of estimate). The z’s and the y’s may 
be any sort of correlated measures, including means. So the z-means 
corresponding to y-means of a given size would scatter in such way 
that the measure of their variability would be 


Tmz, = on,V 1 8 7? mamy: 
But we have already shown in this paper that, neglecting the 1 of the 
(n — 1), o m, equals o,/n, since the oc» is the standard deviation of an 
unrestricted random set of means. It is also known that rn», equals 


Tzy. Substituting these values we have, since the 7-subscript has been 
employed merely to indicate any level of ability at which we are match- 
ing so that we may now drop it from our symbolism, 





on = </T =F —T, (G) 


The r here is the coefficient of correlation between the matching ele- 
ment and the successive samples. In order to involve this principle 
it is only necessary that the groups be matched for equality of means, 
since that will force correlation of individuals. However the r could 





1 Kelley, Truman L.: “Statistical Method.” Pp. 189-191. 

2In the ensuing paragraph we follow the plan of a proof given by Lindquist 
in this Journal, Vol. XXII, pages 197-204. But we believe Lindquist makes an 
incorrect application of his conclusion. 
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not be computed unless individuals were paired as well as means, though 
it might be known from previous experience with the measures. 

We have treated the case where the matching is on a fallible cri- 
terion—the criterion measures having a certain unreliability which 
results in our accepting certain sample groups as matched with the 
criterion when truly measured they would not belong. It is known 
(standard error of measurement) that the scatter of correlated meas- 
ures about the true scores with which they should be paired rather than 
about the fallible ones with which they appear to be paired is measured 
by o2:\/1—r. The same is true when our correlated measures are 
means. Our formula, therefore, where the matching is on a “true” 
criterion, would be: 





on = Si —T. (H) 


We would have such matching on a “‘true”’ criterion where the 
same group was to be retested. The variability of the means to be 
expected if we should repeatedly retest the same group is, probably, 
what we usually have in mind when we think of the standard error of a 
mean; hence formula (H) is the one most often to be used. The r 
here is the reliability coefficient of the test. If we have in mind the 
variability to be expected in case we should sample successive groups 
of the same mental age or of the same social status, we should use 
formula (G@). Here ther would be the coefficient of correlation between 
our test and mental age or social status, either as determined by calcu- 
lation in this case or as known from previous experience with the 
measures. If we have in mind the fluctuation of the means of random 
samples from our population regardless of matching for equality 
in any factor, we should employ formula (Z). One can not speak 
with precision about the standard error of a mean unless he indicates 
whether he refers to a random sampling, or to a sampling matched on 
a fallible criterion (as when one measures, say, the voluntary reading 
of pupils of the same average educational age), or to repeated testing 
of the same group (which involves matching on an infallible criterion 
since the paired individuals are the same persons, hence of truly equal 
ability). The use of the correct rather than an incorrect formula may 
make a vast difference. The standard deviation of the total score on 
the Stanford Achievement Test in the fourth grade is given by Kelley’ 


1 Kelley, Truman L.: ‘Interpretation of Educational Measurements.”’ World 
Book Company, 1927, pp. 198. 














- - a _ ae. le 





Formulae for the Standard Error of a Mean 629 


as ten points and the mean as 32.7. A reliability coefficient of .89 is 
claimed in the test-manual for this grade. By the conventional 
formula (Z) the standard error of the mean would be for a group of 
thirty-six, 

pe 
V/36 
By the correct formula it would be 


10 ,——= 
V3 V/1 — .89 = .57. 
This latter is just about a third of the former. 

We may now bring our development in this section into harmony 
with our mathematical development which culminated in formula (F). 
If we had chosen to do so we could have derived formula (H) directly 
from formula (Ff). It would be easy to show, by appeal to the mathe- 
matics of chance, that, if successive samples each include p times the 
total population from which they are drawn, the probable overlapping 
of the samples is p times the totality of elements of either of the samples. 
Kelley has shown that, if two series overlap in such manner that there 
is as much of the first that is not the second as there is of the second 
that is not the first, the true coefficient of correlation between the series 
is exactly equal to the percentage of overlapping. Since that condition 
is fulfilled here, we might have deduced formula (H) from formula (F) 


by merely substituting r for p in Vv 1 — p. 
n 


1.7. 


Cn = 


om = 





STANDARD ERROR OF A STANDARD DEVIATION 


We may use the same general method of deriving a formula for the 
standard error of a sigma that we employed for the standard error of a 
mean. By definition of a standard deviation, 


2x? 
= V— 
n 


When we are to find a standard error of a o we are to deal with a set of 
theoretical o’s each of which differs more or less from a central one 
that is the mean of the whole set. We shall handle this by the method 
of logarithmic differentiation from the calculus, for the process of 
differentiation involves equating for the two sides of an equation just 
the sort of deviations that we need. Taking logarithms for the two 
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sides of our equation expressing the formula for a standard deviation, 


we have 
lo = lo za" = lo (2 gree 22") 
go B4/ . a * 3 og n 


We shall now take the derivatives for this expression. The derivative 
of the log of a quantity is equal to the derivative of the quantity 
divided by the quantity itself. Therefore, 


o. (2) 


iis 
Squaring both sides, 


(a2) 
do. n 1do® 


o 4 (Sr\? 4 ot 
n 


— 





This is for a single sample. Summing now for the whole number 
of samples, S, and dividing by S, 


Ddo 1 2do** 
-_.. —_— 2. 


But on the left we have the value of o,?, for we have the derivatives 
(which are the deviations) squared, summed, and’ divided by the 
number of cases. On the right we have the same sort of thing, only 
in terms of o,:. Therefore, 


Ce” = ie (I) 


We must, now find a value for o*». We shall develop a general 
formula following the method employed in the development of the 
standard error of a mean, and then make simplifications resulting from 
the assumption of normality in our distribution. 

Let oo? represent the theoretical standard-deviation-squared that 
stands in value at the mean of the whole set of o?’s, and let the devia- 
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tions of the several o”’s from this oo? be represented by dj, de, ds, ete. 
Then 
dy = a1? — o0°; 
and squaring 
dy? = (0;? — ao?)?. 
But 


ar 
Cy aes 
n 


Substituting accordingly, 


d,? = za? a) = (= _ sea 
n n 


f E et) Po), GP <r 














n n n 


In order to carry along less cumbersome notation, we shall represent 
the quantities in parentheses by 2:, Z:, etc. 
Then 





dt = abate t ps + 2n)? 


n? 


We shall have similar expressions for d2, ds, etc., involving each 
of the other samples of the whole set of S. If we sum for all of these 
squared-deviations and divide by S, we shall have 


zd? —s- & (Lz? + 2TZz,z;) 
ol Sn? ‘ 





or 
Sn’o?,, = (Lz? + 2D Zz z)). 


But this is precisely similar in form to (A) in our development of the 
formula for the standard error of a mean. It will simplify in identi- 
cally the same manner, so that we arrive at an equation parallel to (D), 


22? n—1 
Se wee =, eueeeieban 
oe Wa (1 N- i) 
where N is the whole population from which the samples are drawn. 


Substituting for z the value for which we let it stand, and for v= ; ’ 
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p in the same sense as used in our development of the formula for 
sigma of a mean, 


rae as ns oe) me ae -} 22 as re 4 "x YC — 7). 


Multiplying numerator and denominator of the first quantity by o/c, 
remembering that =2?/N is o? (since the x”’s have been summed for the 
whole population, N, with no duplicates), and that, in summing, oo* 
was taken N times so that Zoot would equal No», and taking oo? to 
be substantially equal to o:?, 








oem (We o* — 204 + o4\(1 — p). 


But 
Dat. 
Net 18 82. 


Therefore 
ata = *(Bye! — oA)(1 — p). 

In a normal distribution 6. equals 3. We may safely assume 
normality here, since the distribution to which the #; refers is not 
that of one of the samples but that of the very large total population, 
N. Substituting 3 for 8. we have 

1 204 
ove: = ~ (80% — of)(1 — p) = —-(1 — p). 


We may now substitute this value in equation (IZ), and have 


1 204 o? 
0." = ip —p)= 5,2 — p). 
Therefore 
- = ieee 1 — 
<—_— 


The p here has the same force as in our previous development. 
It is the percentage our sample is of the whole population, hence 
also the percentage of overlapping from sample to sample. If the 
successive samples are correlated with an initial array with which 
they are matched, the p represents the coefficient of correlation 
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between this array and each of the samples, so that we have the 

following formulae for the standard error of a standard deviation: 

O71 (Standard error of a standard deviation where 

oo = yl 1 —r samples are matched on a true criterion, as where 
” we take repeated tests of the same group) 


i — error 4 = py covets when 

_= 4/1 — r?samples are matc on’ a ible criterion, as 

" a/ 2n where we measure the moral judgment of pupils 
of the same average educational age) 


(Standard error of a standard deviation in case 
Oe = of random samples from an unrestricted popula- 
a/2n tion, where consequently no correlation factor 

is present) 

The assumptions involved in this development were: (1) That the 
distribution of the whole population drawn upon is normal; (2) that 
the sigma of the sample we have in hand is substantially equal to 
the mean of all the sigmas; and (3) that the sigma of the sample in 
hand is substantially equal to that of the whole population from 
which the samples were drawn. We know that this last assumption 
is not strictly true. In our previous development we found that the 
standard deviation of a sample is less than that of the whole popu- 
lation sampled so that o,” equals o,? — o,”, and out of that grew the 
(n — 1) instead of the traditional n under the radical in the denomi- 
nator of our formula for the standard error of a mean. Had we been 
willing to introduce greater complexity into our development we could 
have shown that it would have substantially the same effect here. 
But the correction is a trifling one and customarily need not be made. 
In fact none of the assumptions in the development are such as to 
invalidate our formula to any degree at all appreciable. 




















bed 7 2 
ae 
7 — “ sx 
- » 7 "4 » 2 e 
a Sl : ee oe 








AN ANALYSIS OF THE CURVES OF LEARNING AND 
FORGETTING CODE MATERIAL 


HOWARD EASLEY 
Duke University 


The majority of the attainment curves which have been reported 
have taken either the hyperbolic form or a form approximating the 
Gaussian ogive. These have led to the interpretation that in some 
functions learning takes place most rapidly at the beginning of the 
learning process, and in others the size of the increments is small at 
first, increasing to a maximum and diminishing again as the physi- 
ological limit is approached. Thurstone! has shown that the two 
forms may be due merely to two different ways of plotting the same 
function. He has shown that if the curve plotted as speed, Y, against 
the amount of practice units, X, gives the hyperbolic form, then the 
same data plotted as speed, Y, against the total amount of time 
devoted to practice, 7, give a Gaussian ogive. 

Thurstone selects the hyperbolic form as for practical purposes 
the most available. While undoubtedly a majority of the published 
curves do take this form they are in many cases speed-time curves, 
rather than speed-amount curves which, according to Thurstone’s 
equation, should give this form. There are some objections to taking 
this curve as representing the true rate of progress in learning. In 
the first place, it suggests too easily the interpretation noted above, 
namely that the rate of learning is fastest at the beginning. Meyer? 
has pointed out the fallacy of this interpretation but he fails to elimi- 
nate the suggestion entirely. His time curve, which is essentially like 
the other published time and error curves, 7.e., falling rapidly from the 
beginning, does not necessarily lead to this interpretation. But if 
we take his suggestion that the rate of change is slow at first then 
his curve does not readily represent the actual rate of progress in 
learning, which after all is its main function. Then too, if we take 
the hyperbola as representing the true rate of learning, we have to 
assume that the process of learning is fundamentally different in those 
cases which yield the ogive form of the speed-amount curve. 





1 Thurstone, L. L.: The Learning Curve Equation. Psychol. Monog., No. 114, 
1919. 


2 Meyer, M. F. and F. O. Eppright: The Equation of the Learning Function. 
Amer. J. Psychol., Vol. XXXIV, 1924, pp. 203-222. 
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On the other hand if we take the ogive speed-time curve as the 
most typical we have the problem of accounting for the fact that a 
great number, if not a majority, of functions investigated have yielded 
the hyperbola, even when plotted in this way. This is the problem 
of the first part of this paper. 

Twenty-four. college students practiced ten minutes per day on 
Tuesdays, Thursdays, and Saturdays of each week until twenty 


: 


Letters 
Per Min. 
60- 


50- 
40- 
30- 
20- 


10; 
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Pig. 1. Curve of learning with practice periods 
not equated. 


practice periods were reached, translating repeatedly a single sentence 
in code. The code used was taken from the Stanford revision of the 
Binet test. Only one student in the group had ever seen the code, 
and she was familiar with it only to the extent of having taken the 
test several years previously. The sentence translated was “Pack 
my box with five dozen liquor jugs.’’ This was selected because it 
was the shortest sentence known to the writer containing all the 
letters of the alphabet. Thus the learning process involved a minimum 
of other factors than translation, such as memorizing the sentence. 
The code was produced before the subjects on the blackboard and its 
use illustrated until all were familiar with the method of translation. 
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It was then erased and was not shown again during the experiment. 
The subjects were asked not to draw the code nor to practice at any 
time outside of the regular practice periods. Examination of the 
individual records failed to reveal any indication of deviation from 
these instructions. 

The scores were the number of letters translated correctly per 
minute. At each practice period each subject was told the number 
of errors made the previous day and all were instructed to be as 
accurate as possible, since only correct responses were counted. 
The errors were negligible from the beginning and after the fifth or 
sixth practice period were almost wholly absent. 

The scores for the twenty-four subjects were averaged for each 
practice period. When these are plotted against the number of 
periods, or total time of practice, the curve is approximately a straight 
line, as shown in Fig. 1, with only slight negative acceleration in the 
last few periods. If the experiment had been continued until all the 
subjects reached their limits of improvement this negative acceleration 
would have been more marked, as it is in most learning curves. But 
it is the first part of the curve which is of most interest to us here. 
Do we have here a function which is different from the typical learning 
function, or is there some error in our method of representing it? 

If we assume that the whole learning function is represented in 
our curve we must assume that all our subjects started the formal 
practice at the same stage of the learning process involved, and further- 
more, that this stage is the beginning or zero point for all. These 
assumptions are certainly not justified. While it is true that only 
one subject had had any formal practice and this one only a small 
amount, it is quite probable that the previous experiences of the 
subjects had not prepared them all alike for this problem. In other 
words, it is quite probable that they had had varying amounts of 
informal practice previous to the experiment. Since the amount of 
practice during the experiment was constant for all subjects they 
necessarily became relatively more alike in total amount of practice 
as the experiment progressed. 

In Thurstone’s equation for the speed-time curve the point at 
which the rate of change of slope changes from positive to negative 
is one-third of the distance from zero to the limit of improvement. 
This point then is equivalent for all subjects and, since it is a function 
of the final stage of learning rather than the beginning, is independent 
of their previous practice. This equation is accepted as probably the 
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nearest. approach to the true learning curve if all subjects were 
measured from the zero point of achievement. In the absence of the 
true limit of improvement of the subjects the points on each individual 
curve which nearest approximated one-third of the distance from zero 
to the highest achievement reached by each individual were taken as 
equivalent for all subjects. This involves a certain amount of error 
since they were not at the same stage of learning at the end of twenty 
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Fig. 2. Curve of learning with practice periods 
' equated, 

practice periods. But this is a closer approximation to really equiva- 
lent points on the various curves than are the beginning or end points, 
or any other points selected arbitrarily. This point was reached by 
one subject as early as the third practice period, while two others 
reached it only on the ninth day of practice, the others reaching it at 
various intermediate periods. When the results are plotted we find 
the curve in Fig. 2, closely approximating the ogive in the early part. 

Of the curves for the individual subjects some were positively 
accelerated from the beginning, some were negatively accelerated, and 
some were, for the first half of the experiment, approximately straight 
lines. When these are combined into a single curve without making 
any adjustment for differences between stages of the learning process 
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at which the various subjects started formal practice the rate of change 
becomes approximately constant (Fig. 1), as we should expect. 

Our next problem was the shape of the curve of forgetting. Pub- 
lished curves of forgetting seem, on the surface, to indicate that for- 
getting takes place most rapidly immediately after practice ceases. 
If this were true it would seem that until the function is well learned, 
as little time as possible should elapse between practice periods, allow- 
ing as little forgetting as possible to be overcome at each subsequent 
practice period. But this has frequently been demonstrated not to 
be the case. 

The published experiments on forgetting have, I believe, invariably 
involved a relatively long period between the last practice and the first 
measurement of retention. Finkenbinder’s! experiment is notable 
for the early period of measuring forgetting, but even here the first 
forgetting interval is many times the interval between practice periods. 
The practice consisted of continuous repetition of a short series of 
nonsense syllables. The interval between practice periods could not 








TaBLzE I 
Group Average Range 
I 53.1 43.3 
II 51.9 43.9 
Ill 55.4 44.9 
IV 53.1 50.4 











be considered to be longer than the time necessary to repeat the whole 
series, only a few seconds; yet the first measurement of retention is 
made thirty minutes after practice ceased. It would be interesting 
to know the course of the curve if measurements had been taken at 
shorter intervals within the thirty minutes. 

In this experiment, the twenty-four subjects were ranked on the 
basis of the averages for the last three performances. This was taken 
as a measure of their degree of learning rather than the last trial in 
order to have a more stable measure. They were then divided into 
four groups as nearly equal in final standing as possible. Each group 
of six subjects contained one from the best four and one from the 
poorest four. The others were selected so as to have the averages and 





1 Finkenbinder, E. O.: The Curve of Forgetting. Amer. J. Psychol., Vol. 
XXIV, 1913, pp. 8-32. 
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ranges for the four groups as nearly alike as possible. Table I gives 

a comparison of the four groups. 

_ One of these groups was tested with a five-minute exercise five days 

after the last practice period, another ten, another seventeen, and the 

last one thirty-one days after practice. Their scores were expressed 

as percentages of their averages for the last three days of practice. 
Per Cent 
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Fig. 3. Curve of Forgetting. 


The curve of forgetting thus plotted is shown in Fig. 3. Here we have, 
instead of the typical curve of forgetting, a curve falling very slowly 
at first and later more rapidly. If our first measurement had occurred 
thirty-one days after practice ceased the curve would have resembled 
the typical forgetting curve; but in this case it is evident from the early 
measurements that forgetting took place slowly at first, the phase of 
rapid forgetting occurring between the seventeenth and thirty-first 
days. This curve seems more consistent with the fact that, within 
certain limits, practice on a complex function should be distributed 
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rather than concentrated. It would seem that the intervals between 
practice periods should be long enough to include the phase of very slow 
forgetting, but not long enough to include the phese of rapid forgetting 
which follows. 

It is impossible to make generalizations with certainty from the 
analysis of these particular curves of learning and forgetting. What is 
true of this function may not be true of other functions. It is shown, 
however, that plotted in the usual way our curves would have been like 
those usually found, whereas in this case such curves do not accurately 
represent the true progress of learning and forgetting. It is quite 
probable that a similar analysis would reveal similar errors in the 
curves of other learning functions. 
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